from:"Erik Corry"

Re: Proposal: Expose offsets for capturing groups in regular expression matches

2017-03-23 Thread Erik Corry

This would be great.  Can I suggest that both the start and end of each
match should be there.  So instead of offsets you would have "starts" and
"ends".  Alternatively, offsets should be twice as long with start-end
pairs in it.

On Mon, Oct 31, 2016 at 9:53 AM, Sebastian Zartner <
sebastianzart...@gmail.com> wrote:

> Hello together,
>
> for advanced processing of capturing groups in regular expression, I'd
> like to propose to expose their offsets within the results of executing an
> expression on a string.
>
> The complete proposal can be found at https://github.com/SebastianZ/
> es-proposal-regexp-capturing-group-offsets.
>
> I'd like it to be added to the Stage 0 proposals
>  and
> I'm asking for feedback and a champion to help me bring it into shape and
> get it into the standard.
>
> Thank you in advance,
>
> Sebastian
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Backward running version look-behinds

2015-11-25 Thread Erik Corry

This is great stuff, thanks for doing this.

I couldn't see any bugs in it, though I must admit that 21.2.2.4 part 4
made my head hurt, so I skipped it.

Just to prove I actually read it, I'll point out that independant is
spelled independent,

On Tue, Nov 24, 2015 at 6:35 PM, Claude Pache 
wrote:

>
> Le 20 nov. 2015 à 15:41, Nozomu Katō  a écrit :
>
> I was expecting that ES6 would come with look-behinds, because a
> proposal had been put at:
>
> http://web.archive.org/web/20121114071428/http://wiki.ecmascript.org/doku.php?id=harmony:proposals
>
> However, ES6 does not support them. I noticed that the link to the
> proposal had been struck-through:
>
> http://web.archive.org/web/20150812143714/http://wiki.ecmascript.org/doku.php?id=harmony:proposals
>
> I wondered what was a problem. I did research to know the situation
> about look-behinds, and I found this post:
> https://mail.mozilla.org/pipermail/es-discuss/2013-October/033911.html
>
> I realised that a spec needed to be written by someone, but "someone"
> had not appeared yet. Thus, I wrote a spec, subscribed to es-discuss,
> and posted the spec. What made me decide to post that spec was this
> post and thread.
>
> But now, it turns out that look-behinds similar to the proposal that has
> been struck-through have been implemented experimentally in Chromium and
> Gecko. I am confused about the ongoing situation.
>
> I am NOT an objector against .NET-compatible look-behinds. But I wonder
> if there is someone who writes a spec for them. I have no idea how the
> behaviours of look-behinds based on the .NET implementation are
> described in the language used by the ECMAScript spec. Introducing an
> internal direction switch might be a relatively simple way, but I have
> no concrete idea even about it.
>
> Nozomu
>
>
>
> I've amended the spec in order to add .NET-style lookbehinds. It proved to
> be indeed relatively simple, once you get how it works. Here is the result
> (with diffs):
>
> http://claudepache.github.io/ecma262/#sec-pattern
>
> The most difficult part was to manage to output the token ` ecmarkdown :-P
>
> —Claude
>
>
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Backward running version look-behinds

2015-11-24 Thread Erik Corry

I can speak only for myself.  I like the .Net-style lookbehinds, and I hope
they will be part of the standard.  For something to be in the standard we
need both implementations and someone to describe the desired behaviour in
the standards document.  It looks like implementations are being found.
Hopefully someone can write the document so this can move forward.

On Fri, Nov 20, 2015 at 3:41 PM, Nozomu Katō  wrote:

> I was expecting that ES6 would come with look-behinds, because a
> proposal had been put at:
>
> http://web.archive.org/web/20121114071428/http://wiki.ecmascript.org/doku.php?id=harmony:proposals
>
> However, ES6 does not support them. I noticed that the link to the
> proposal had been struck-through:
>
> http://web.archive.org/web/20150812143714/http://wiki.ecmascript.org/doku.php?id=harmony:proposals
>
> I wondered what was a problem. I did research to know the situation
> about look-behinds, and I found this post:
> https://mail.mozilla.org/pipermail/es-discuss/2013-October/033911.html
>
> I realised that a spec needed to be written by someone, but "someone"
> had not appeared yet. Thus, I wrote a spec, subscribed to es-discuss,
> and posted the spec. What made me decide to post that spec was this
> post and thread.
>
> But now, it turns out that look-behinds similar to the proposal that has
> been struck-through have been implemented experimentally in Chromium and
> Gecko. I am confused about the ongoing situation.
>
> I am NOT an objector against .NET-compatible look-behinds. But I wonder
> if there is someone who writes a spec for them. I have no idea how the
> behaviours of look-behinds based on the .NET implementation are
> described in the language used by the ECMAScript spec. Introducing an
> internal direction switch might be a relatively simple way, but I have
> no concrete idea even about it.
>
> Nozomu
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Look-behind proposal in trouble

2015-11-11 Thread Erik Corry

And here's a similar playground for .Net, not by me:

http://www.regexplanet.com/advanced/dotnet/index.html

On Tue, Nov 10, 2015 at 11:08 AM, Erik Corry <erik.co...@gmail.com> wrote:

> I made a playground where you can try out regexps with lookbehind.
>
> https://dartpad.dartlang.org/8feea83c01ab767acdf1
>
> On Tue, Oct 13, 2015 at 9:24 PM, Nozomu Katoo <noz...@akenotsuki.com>
> wrote:
>
>> Erik Corry wrote on Tue, 13 Oct 2015 at 11:18:48 +0200:
>> > Yes, that makes sense.
>> >
>> > This could be fixed by removing {n} loops from positive lookbehinds.
>> Or by
>> > doing the .NET-style back-references immediately.
>>
>> Personally, I am reluctant to remove any feature from the current
>> proposal intentionally for a future proposal that it is uncertain
>> whether it really comes or not. It might end up only making lookbehinds
>> of ECMAScript and ones of Perl 5 incompatible.
>>
>> >> On 10/10/2015 03:48, Erik Corry wrote:
>> >>
>> >>>
>> >>> On Sat, Oct 10, 2015 at 12:47 AM, Waldemar Horwat wrote:
>> >>>
>> >>> It's not a superset.  Captures would match differently.
>> >>>
>> >>>
>> >>> Can you elaborate?  How would they be different?
>> >>>
>> >>
>> >> If you have a capture inside a loop (controlled, say, by {n}), one of
>> the
>> >> proposals would capture the first instance, while the other proposal
>> would
>> >> capture the last instance.
>>
>> I was missing that point. I just confirmed that
>>
>>   perl -e "$a = 'abcdef'; $a =~ /(?<=.(.){2}.)./; print $1;"
>>
>> returned 'c' whereas .NET returned 'b'. Implementation based on my
>> proposal would return the same result as Perl 5.
>>
>>
>> By the way, at one point in this thread, I moved some email addresses
>> from To to Cc when sending my reply. But somehow several of them had
>> disappeared from the Cc field in the delivered email while they all
>> remain in a copy in my sent-email folder. I apologize to those who
>> received disconnected emails in this thread.
>>
>> Regards,
>>   Nozomu
>>
>>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Look-behind proposal in trouble

2015-11-10 Thread Erik Corry

I made a playground where you can try out regexps with lookbehind.

https://dartpad.dartlang.org/8feea83c01ab767acdf1

On Tue, Oct 13, 2015 at 9:24 PM, Nozomu Katoo <noz...@akenotsuki.com> wrote:

> Erik Corry wrote on Tue, 13 Oct 2015 at 11:18:48 +0200:
> > Yes, that makes sense.
> >
> > This could be fixed by removing {n} loops from positive lookbehinds.  Or
> by
> > doing the .NET-style back-references immediately.
>
> Personally, I am reluctant to remove any feature from the current
> proposal intentionally for a future proposal that it is uncertain
> whether it really comes or not. It might end up only making lookbehinds
> of ECMAScript and ones of Perl 5 incompatible.
>
> >> On 10/10/2015 03:48, Erik Corry wrote:
> >>
> >>>
> >>> On Sat, Oct 10, 2015 at 12:47 AM, Waldemar Horwat wrote:
> >>>
> >>> It's not a superset.  Captures would match differently.
> >>>
> >>>
> >>> Can you elaborate?  How would they be different?
> >>>
> >>
> >> If you have a capture inside a loop (controlled, say, by {n}), one of
> the
> >> proposals would capture the first instance, while the other proposal
> would
> >> capture the last instance.
>
> I was missing that point. I just confirmed that
>
>   perl -e "$a = 'abcdef'; $a =~ /(?<=.(.){2}.)./; print $1;"
>
> returned 'c' whereas .NET returned 'b'. Implementation based on my
> proposal would return the same result as Perl 5.
>
>
> By the way, at one point in this thread, I moved some email addresses
> from To to Cc when sending my reply. But somehow several of them had
> disappeared from the Cc field in the delivered email while they all
> remain in a copy in my sent-email folder. I apologize to those who
> received disconnected emails in this thread.
>
> Regards,
>   Nozomu
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Look-behind proposal in trouble

2015-10-13 Thread Erik Corry

Yes, that makes sense.

This could be fixed by removing {n} loops from positive lookbehinds.  Or by
doing the .NET-style back-references immediately.

On Mon, Oct 12, 2015 at 10:01 PM, Waldemar Horwat <walde...@google.com>
wrote:

> On 10/10/2015 03:48, Erik Corry wrote:
>
>>
>>
>> On Sat, Oct 10, 2015 at 12:47 AM, Waldemar Horwat <walde...@google.com
>> <mailto:walde...@google.com>> wrote:
>>
>> It's not a superset.  Captures would match differently.
>>
>>
>> Can you elaborate?  How would they be different?
>>
>
> If you have a capture inside a loop (controlled, say, by {n}), one of the
> proposals would capture the first instance, while the other proposal would
> capture the last instance.
>
> Waldemar
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Look-behind proposal in trouble

2015-10-12 Thread Erik Corry

Just for the lulz I ran the tests I could find from perl5 (which I think is
very similar to the proposal here) and the captures were identical when
using .Net-style reverse capturing.  It's not a huge number of tests,
though.

On Sat, Oct 10, 2015 at 12:48 PM, Erik Corry <erik.co...@gmail.com> wrote:

>
>
> On Sat, Oct 10, 2015 at 12:47 AM, Waldemar Horwat <walde...@google.com>
> wrote:
>>
>> It's not a superset.  Captures would match differently.
>
>
> Can you elaborate?  How would they be different?
>
> --
> Erik Corry
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Look-behind proposal in trouble

2015-10-10 Thread Erik Corry

On Sat, Oct 10, 2015 at 12:47 AM, Waldemar Horwat <walde...@google.com>
wrote:
>
> It's not a superset.  Captures would match differently.


Can you elaborate?  How would they be different?

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Look-behind proposal in trouble

2015-10-09 Thread Erik Corry

I made an implementation of .NET-style variable length lookbehinds.  It's
not in a JS engine, but it's in a very simple (and very slow)
ES5-compatible regexp engine that is used in the tiny Dart implementation
named Fletch.

No unicode issues arise since this engine does not support /u, but I don't
expect any issues since it's not trying to second-guess the length of  the
string matched by an expression.

Needs a lot more tests, but it seems to work OK and was surprisingly simple
to do.  Basically:

* All steps in the input string are reversed, so if you would step forwards
you step backwards.
* Check for start of string instead of end of string.
* Test against the character to the left of the cursor instead of to the
right.
* The parts of the Alternative (see the regexp grammar in the standard) are
code-generated in reverse order.

Code is here: https://codereview.chromium.org/1398033002/


On Wed, Oct 7, 2015 at 9:08 PM, Brian Terlson 
wrote:

> Sebastian,
>
>
>
> You can follow the tc39/ecma262 github repository for updates on
> proposals. It also contains information about our process.
>
>
>
> *From:* Sebastian Zartner [mailto:sebastianzart...@gmail.com]
> *Sent:* Monday, October 5, 2015 10:56 PM
> *To:* Nozomu Katō 
> *Cc:* Brian Terlson ; es-discuss Mozilla <
> es-discuss@mozilla.org>; Gorkem Yakin 
> *Subject:* Re: Look-behind proposal in trouble
>
>
>
> Hi together,
>
> Brian, where can people get the information about the reasons of such
> decisions (besides asking) and more generally about the processes behind
> the ES development?
>
> I was following Nozomu's proposal[1] closely, though to me it looked like
> the progress on this just died out.
>
> Non-the-less, great to hear that new champions could be found!
>
>
>
> Sebastian
>
> [1] https://mail.mozilla.org/pipermail/es-discuss/2015-May/042910.html
> 
>
>
>
> On 5 October 2015 at 23:42, Nozomu Katō  wrote:
>
> Hello Brian,
>
> I thank you very much indeed for your email and bringing really good
> news! I thought that my proposal might not be able to move forward
> anymore.
>
> I am also thankful that you searched for a new champion and Gorkem
> undertakes this proposal!
>
> Regards,
>   Nozomu
>
>
> Brian Terlson wrote on Mon, 5 Oct 2015, at 20:29:18 +:
> > Hi Nozomu,
> >
> > Brendan has indeed discovered he doesn't have time to champion the
> > proposal through TC39, so I removed it while I searched for a new
> > champion. Good news on that front - I have found one! Gorkem Yakin
> > works on the Chakra team and is available to help move this proposal
> > forward. I will also help out where I can. I've added the proposal
> > back to the stage 0 list!
> >
> > Thanks,
> > Brian
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
> 
>
>
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Look-behind proposal in trouble

2015-10-09 Thread Erik Corry

I'm not convinced that the current proposal is easier to implement than the
real thing.  Take a look at the patch, it's trivial.

The lack of variable length lookbehind is a big annoyance in most
languages.  Search for the term and you'll find lots of frustrated perl
users.

On the other hand I don't think adding variable length lookbehind to the
spec makes it any easier to optimize /.+$/.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Look-behind proposal in trouble

2015-10-07 Thread Erik Corry

Your proposal for look-behind relies on being able to count the match
length of the look-behind in order to step back that far.  This presupposes
that atoms like . and character classes have a fixed length.

However, with the /u flag, the . and some character classes can be either 1
or two code units.  This means you don't know how far to step back.  This
needs to be fixed in a way that is not incompatible with the "correct" .NET
way of doing things.

Eg matching /a.(? wrote:

> Apparently my proposal for adding the look-behind assertions to RegExp
> has been in trouble. I would like to ask anyone for help.
>
> The following story is what I know about the proposal after my previous
> post:
>
> I created a pull request for the proposal in July and sent an email to
> Brendan Eich asking if I can put his name as a champion:
> https://github.com/tc39/ecma262/pull/48
>
> I have not received a reply to my email, but I received a notification
> email in September that replying to the pull request, the proposal was
> moved to stage 0. Today, however, I just noticed that the proposal had
> been dropped from stage 0, stating "RegExp lookbehind has no champion".
> https://github.com/tc39/ecma262/commits/master/stage0.md (Oct 4, 2015)
>
> I am uncertain about what happened. Does this mean that Brendan Eich is
> no longer a champion or did not take a champion on from the beginning or
> ...?
>
>
> Regards,
>   Nozomu
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Look-behind proposal in trouble

2015-10-07 Thread Erik Corry

Oops forgot the /u on the regexp in the example.

On Wed, Oct 7, 2015 at 10:06 AM, Erik Corry <erik.co...@gmail.com> wrote:

> Your proposal for look-behind relies on being able to count the match
> length of the look-behind in order to step back that far.  This presupposes
> that atoms like . and character classes have a fixed length.
>
> However, with the /u flag, the . and some character classes can be either
> 1 or two code units.  This means you don't know how far to step back.  This
> needs to be fixed in a way that is not incompatible with the "correct" .NET
> way of doing things.
>
> Eg matching /a.(? cat-face-with-tears-of-joy, which is a surrogate pair).  The back reference
> has an apparent width of 3, so we step back 3 code units, but that hits the
> 'a', not the 'x' and so the back reference fails to spot the 'x'.
>
>
> On Sun, Oct 4, 2015 at 1:52 PM, Nozomu Katō <noz...@akenotsuki.com> wrote:
>
>> Apparently my proposal for adding the look-behind assertions to RegExp
>> has been in trouble. I would like to ask anyone for help.
>>
>> The following story is what I know about the proposal after my previous
>> post:
>>
>> I created a pull request for the proposal in July and sent an email to
>> Brendan Eich asking if I can put his name as a champion:
>> https://github.com/tc39/ecma262/pull/48
>>
>> I have not received a reply to my email, but I received a notification
>> email in September that replying to the pull request, the proposal was
>> moved to stage 0. Today, however, I just noticed that the proposal had
>> been dropped from stage 0, stating "RegExp lookbehind has no champion".
>> https://github.com/tc39/ecma262/commits/master/stage0.md (Oct 4, 2015)
>>
>> I am uncertain about what happened. Does this mean that Brendan Eich is
>> no longer a champion or did not take a champion on from the beginning or
>> ...?
>>
>>
>> Regards,
>>   Nozomu
>> ___
>> es-discuss mailing list
>> es-discuss@mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Look-behind proposal in trouble

2015-10-07 Thread Erik Corry

The proposal needs to be clarified to explain that you are stepping back a
number of code points, not units.  This implies that you are inspecting the
input string as you step backwards.  Also it should be explained what to do
if there are unpaired surrogates in the input string and inside the
lookbehind expression source.

I think the proposal would benefit from a pointer to an implementation or
two.  Of course the implementations should also fully support /u.

On Wed, Oct 7, 2015 at 11:10 AM, Claude Pache <claude.pa...@gmail.com>
wrote:

> This should not be a problem: With the /u flag, you work with code points,
> not code units. In particular, the `.` matches always a sequence (of code
> points with /u, or code units otherwise) of length 1.
>
> —Claude
>
>
>
> Le 7 oct. 2015 à 10:08, Erik Corry <erik.co...@gmail.com> a écrit :
>
> Oops forgot the /u on the regexp in the example.
>
> On Wed, Oct 7, 2015 at 10:06 AM, Erik Corry <erik.co...@gmail.com> wrote:
>
>> Your proposal for look-behind relies on being able to count the match
>> length of the look-behind in order to step back that far.  This presupposes
>> that atoms like . and character classes have a fixed length.
>>
>> However, with the /u flag, the . and some character classes can be either
>> 1 or two code units.  This means you don't know how far to step back.  This
>> needs to be fixed in a way that is not incompatible with the "correct" .NET
>> way of doing things.
>>
>> Eg matching /a.(?> cat-face-with-tears-of-joy, which is a surrogate pair).  The back reference
>> has an apparent width of 3, so we step back 3 code units, but that hits the
>> 'a', not the 'x' and so the back reference fails to spot the 'x'.
>>
>>
>> On Sun, Oct 4, 2015 at 1:52 PM, Nozomu Katō <noz...@akenotsuki.com>
>> wrote:
>>
>>> Apparently my proposal for adding the look-behind assertions to RegExp
>>> has been in trouble. I would like to ask anyone for help.
>>>
>>> The following story is what I know about the proposal after my previous
>>> post:
>>>
>>> I created a pull request for the proposal in July and sent an email to
>>> Brendan Eich asking if I can put his name as a champion:
>>> https://github.com/tc39/ecma262/pull/48
>>>
>>> I have not received a reply to my email, but I received a notification
>>> email in September that replying to the pull request, the proposal was
>>> moved to stage 0. Today, however, I just noticed that the proposal had
>>> been dropped from stage 0, stating "RegExp lookbehind has no champion".
>>> https://github.com/tc39/ecma262/commits/master/stage0.md (Oct 4, 2015)
>>>
>>> I am uncertain about what happened. Does this mean that Brendan Eich is
>>> no longer a champion or did not take a champion on from the beginning or
>>> ...?
>>>
>>>
>>> Regards,
>>>   Nozomu
>>> ___
>>> es-discuss mailing list
>>> es-discuss@mozilla.org
>>> https://mail.mozilla.org/listinfo/es-discuss
>>>
>>
>>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Q: Lonely surrogates and unicode regexps

2015-01-31 Thread Erik Corry

I think it's problematic that this is being standardized without a single
implementation.

On Wed, Jan 28, 2015 at 11:57 AM, André Bargull andre.barg...@udo.edu
wrote:

  On Wed, Jan 28, 2015 at 11:36 AM, Marja Hölttä marja at chromium.org 
 https://mail.mozilla.org/listinfo/es-discuss wrote:

 * The ES6 unicode regexp spec is not very clear regarding what should happen
 ** if the regexp or the matched string contains lonely surrogates (a lead
 ** surrogate without a trail, or a trail without a lead). For example, for 
 the
 ** . operator, the relevant parts of the spec speak about characters:
 *
 Just a bit of terminology.

 The term character is overloaded, so Unicode provides the unambiguous
 term code point. For example, U+0378 is not (currently) an encoded
 character according to Unicode, but it would certainly be a terrible idea
 to disregard it, or not match it. It is a reserved code point that may be
 assigned as an encoded character in the future. So both U+D83D and U+0378
 are not characters.

 If a ES spec uses the term character instead of code point, then at
 some point in the text it needs to disambiguate what is meant.


 character is defined in 21.2.2 Pattern Semantics [1]:

 In the context of describing the behaviour of a BMP pattern “character”
 means a single 16-bit Unicode BMP code point. In the context of describing
 the behaviour of a Unicode pattern “character” means a UTF-16 encoded code
 point.



 [1]
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-pattern-semantics

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Erik Corry

On Wed, Jan 28, 2015 at 11:45 AM, Mathias Bynens math...@qiwi.be wrote:


  On 28 Jan 2015, at 11:36, Marja Hölttä ma...@chromium.org wrote:
 
  For example, the current version of Mathias’s ES6 Unicode regular
 expression transpiler ( https://mothereff.in/regexpu ) converts /a.b/u
 into
 /a(?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF])b/
 and afaics it’s not yet fully consistent wrt lonely surrogates, so, a
 consistent implementation is going to be more complex than this.

 This is indeed an incomplete solution. The lack of lookbehind support in
 ES makes this hard to transpile correctly. Ideas welcome!


I don't think your transpiler can work without lookbehind.  If you could
guarantee that none of your transpiled regexp matches a substring that ends
in the middle of a pair, then I think you could get it right without
lookbehind, but consider:

TxL-TxLT.test(/(...)-\1./);

Where L stands for a lead surrogate, and T stands for a trailing
surrogate.  There's no way to stop the backreference from swallowing the
last L, and without lookbehind there is no way to stop the . from matching
the final T.  A second issue is having a match that starts in the middle of
a pair. You could test for this after the matching if JS gave you the index
of the match in the string, but I don't think it does.

Ignoring the start-of-match-in-the-middle-of-a-pair issue, and the
backreferences case, I think you can do without the backreference.
Assuming the lonely-surrogates-are-a-character scenario, the period (.)
transpiles to (ignore spaces added for readability):

(?:  \L(?!\T)  | \L\T  |  \T  |  [^\L\T\N])

where \L means leading surrogates, \T means trailing surrogates, \N means
all newlines.  Whatever comes before the . is not allowed to match a half

As an optimization, .x can transpile to (?: \L\T | . )x where the x stands
in for any literal characters.

For a JS engine implementor, like Marja, it is of course possible to add
1-character negative lookbehind (\b already has elements of this).  Then
your in-engine transpiler turns . into

(?:  \L(?!\T)  | \L\T  |  (?!\L)\T  |  [^\L\T\N])

Which is going to be truly horrible in terms of code size and performance.
It's not like the period operator is a rare thing in a regexp, and other
common things like [^a-z] and [^\d] will expand into similar horrors.

On the other hand, in the lonely-surrogates-match-nothing scenario, the .
transpiles to

(?: \l\t  |  [^\l\t\n] )

which is quite a lot nicer and faster.  In this scenario, .x expands to (?:
\L\T | [^\T\L\N )  which still has no lookaheads and lookbehinds.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Code points vs Unicode scalar values

2013-09-20 Thread Erik Corry

On Wed, Sep 11, 2013 at 12:40 PM, Anne van Kesteren ann...@annevk.nlwrote:

 On Tue, Sep 10, 2013 at 8:14 AM, Mathias Bynens math...@qiwi.be wrote:
  FWIW, here’s a real-world example of a case where this behavior is
 annoying/unexpected to developers: http://cirw.in/blog/node-unicode

 That seems like a serious bug in V8 though. A utf-8 encoder should
 never ever generate CESU-8 byte sequences.


Just to be clear, V8 does not generate CESU-8 if you give it well formed
UTF-16.

If you give it broken UTF-16 with unpaired surrogates you can either break
the data or emit CESU-8.  In the first case, you overwrite the unpaired
surrogates with some sort of error character code.  In the second case you
can generate three-byte UTF-8 sequences that are not strictly legal.  The
second option will preserve the data if you round-trip it into V8 again (or
feed it to other apps that are liberal in what they accept), so that's what
V8 currently does.

-- 
Erik Corry


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: strawman for the := operator

2012-08-08 Thread Erik Corry

From the straw man:

class Point {
   constructor(x,y) {
  this := {x,y}  //define and initialize x and y properties of new object
   }
}

should this read:

this := {x: x, y: y}

?

On 7 August 2012 00:44, Allen Wirfs-Brock al...@wirfs-brock.com wrote:
 Based upon discussions last week in the July 25, 2012 - TC39 Meeting Notes
 thread, I've created a new strawman proposal for a := operator.  See
 http://wiki.ecmascript.org/doku.php?id=strawman:define_properties_operator

 := is a convient way to copy properties from one object to another or to
 extend an object with new properties.  It combines supports for many of the
 same use cases as the previously proposed object extension literals and
 the JSFixed Object.extend proposal.

 The most important characteristic of := is that it uses
 [[DefineOwnProperty]] semantics rather than [[Put]] semantics to define
 properties on its target object so it doesn't run into issues with
 assignment to accessor properties or over-riding inherited readonly
 properties. It is also smart about dealing with methods that reference
 super.

 Some basic examples:

   target := src;  //define all own properties of src onto target

//add a method and an accessor to an existing prototype
   Point.prototype := {
plus(aPoint) {return new
 this.comstructor(this.x+aPoint.x,this.y+aPoint.y},
get rho() {return Math.sqrt(this.x*this.x+this.y*this.y}
 };


 Have at it,

 Allen

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: strawman for the := operator

2012-08-08 Thread Erik Corry

Hi

This proposal offers a way to get around some of the strange semantics
of '=', specifically the way read-only properties and setters on
objects in the prototype chain can restrict what you can do on the
receiver of an assignment.  However it has some strangeness itself:

* There is little point in having read-only properties if the common
way to do assignment is :=.  := will just walk all over a read-only
property.

* Copying private members from one object to another violates the
encapsulation pretty badly.  You would hope that using private names
allowed you to easily reason about which objects have which
properties, just by looking at the limited number of places a private
name is used.  But with this any code in the system that has two
instances of a class can splat object a's private properties with
those from object b.  It's rather like a replay attack in crypto.

* I don't understand the super stuff.  Is there a typo here?:
MyConstructor.prototype = Object.create(Baz);  //inherit from Bar, not
Foo


On 7 August 2012 00:44, Allen Wirfs-Brock al...@wirfs-brock.com wrote:
 Based upon discussions last week in the July 25, 2012 - TC39 Meeting Notes
 thread, I've created a new strawman proposal for a := operator.  See
 http://wiki.ecmascript.org/doku.php?id=strawman:define_properties_operator

 := is a convient way to copy properties from one object to another or to
 extend an object with new properties.  It combines supports for many of the
 same use cases as the previously proposed object extension literals and
 the JSFixed Object.extend proposal.

 The most important characteristic of := is that it uses
 [[DefineOwnProperty]] semantics rather than [[Put]] semantics to define
 properties on its target object so it doesn't run into issues with
 assignment to accessor properties or over-riding inherited readonly
 properties. It is also smart about dealing with methods that reference
 super.

 Some basic examples:

   target := src;  //define all own properties of src onto target

//add a method and an accessor to an existing prototype
   Point.prototype := {
plus(aPoint) {return new
 this.comstructor(this.x+aPoint.x,this.y+aPoint.y},
get rho() {return Math.sqrt(this.x*this.x+this.y*this.y}
 };


 Have at it,

 Allen

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Unicode normalization

2012-05-30 Thread Erik Corry

2012/5/30 Norbert Lindenberg ecmascr...@norbertlindenberg.com:
 This is for the Language Specification, not the Internationalization API 
 Specification.

 The assumptions are in the Language Specification, so they have to be 
 addressed there.

 A normalization API can live in the Language Specification or in the 
 Internationalization API. If we keep it simple (as this one function), then I 
 think it can easily be added to String.prototype. More fine-grained 
 functionality (like in ICU) would have to go into the Internationalization 
 API (v2). The two are not mutually exclusive.

Having read through the normalization spec I don't agree that it is
simple.  I would suggest that this is more appropriately placed in the
internationalization API than in the core language.

Since concatenating two long canonicalized strings to make a new
canonicalized string is much faster than first concatenating, then
renormalizing, perhaps a method should be provided for that combined
concat-giving-a-normalized-result operation.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Digraphs and Unicode pretty-glyphs, for arrows, triangle, etc.

2012-04-11 Thread Erik Corry

2012/4/10 Andreas Rossberg rossb...@google.com:
 On 5 April 2012 17:35, Thaddee Tyl thaddee@gmail.com wrote:

 On Thu, Apr 5, 2012 at 5:00 PM, Adam Shannon a...@ashannon.us wrote:
  I don't see anything inherently wrong with adding some nice sugar to
  ES, because the people who will be using this math heavy notation
  will be those who are used to it. The everyday ecmascript programmer
  probably won't touch these because they might add extra work for them.
  Plus, it'd be nice to be able to read math in ES (for us math oriented
  folk).

 Leksah http://leksah.org/ is a Haskell IDE whose editor converts -
 and other operators to their unicode equivalent. It saves the file in
 ascii.


 Indeed, this is standard practice for almost all functional languages. For
 example, even old-school Emacs modes for Haskell, OCaml, Agda, Coq, etc are
 all capable of rendering underlying ASCII with nice math characters, and
 have been for ages.

There are lots of apps that display JS source.  For example the web
based code review tool, the terminal output from 'svn blame' or the
pages produced by the svn-www source browsing gateway.  There are
services like gist or mailing lists like this one.  All of these are
capable of displaying Unicode characters with no problems these days,
but it would be too much to ask for all of them to autoconvert =into
=.

Note that I said they could all display Unicode, but it is not
necessarily easy to input Unicode characters, so the ASCII version
still has to work.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Digraphs and Unicode pretty-glyphs, for arrows, triangle, etc.

2012-04-11 Thread Erik Corry

2012/4/11 Erik Corry erik.co...@gmail.com:
 but it would be too much to ask for all of them to autoconvert =into
 =.

And that is how I found out that Gmail autoconverts in the opposite direction!

-- 
Erik corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Digraphs and Unicode pretty-glyphs, for arrows, triangle, etc.

2012-04-10 Thread Erik Corry

2012/4/10 Andreas Rossberg rossb...@google.com:
 On 5 April 2012 17:35, Thaddee Tyl thaddee@gmail.com wrote:

 On Thu, Apr 5, 2012 at 5:00 PM, Adam Shannon a...@ashannon.us wrote:
  I don't see anything inherently wrong with adding some nice sugar to
  ES, because the people who will be using this math heavy notation
  will be those who are used to it. The everyday ecmascript programmer
  probably won't touch these because they might add extra work for them.
  Plus, it'd be nice to be able to read math in ES (for us math oriented
  folk).

 Leksah http://leksah.org/ is a Haskell IDE whose editor converts -
 and other operators to their unicode equivalent. It saves the file in
 ascii.


 Indeed, this is standard practice for almost all functional languages. For
 example, even old-school Emacs modes for Haskell, OCaml, Agda, Coq, etc are
 all capable of rendering underlying ASCII with nice math characters, and
 have been for ages.

 No need to burden the language with multiple representations. Algol 68 tried
 and failed :).

I think Unicode support has come a long way since then.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: RegExp.escape()

2012-03-23 Thread Erik Corry

2012/3/23 Steven Levithan steves_l...@hotmail.com:
 On Wednesday, Jan 04, 2012 at 8:03 PM, Kris Kowal wrote:

 On Sun, Jun 13, 2010 at 7:50 AM, Jordan Osete jordan.os...@yahoo.fr
 wrote:

 Hello everybody.

 How about standardizing something like RegExp.escape() ?
 http://simonwillison.net/2006/Jan/20/escape/

 It is trivial to implement, but it seems to me that this functionality
 belongs to the language - the implementation obviously knows better
 which characters must be escaped, and which ones don't need to.


 +1


 +1, again.

 Although this is only a minor convenience since you can do something like
 text.replace(/[-[\]{}()*+?.,\\^$|]/g, \\$), the list of special
 characters is subject to change. E.g., if ES adds /x, whitespace (and
 possibly #) must be added.

In perl the recommended version is

text.replace(/([^a-zA-Z0-9])/g, \\$1)

which is future-proof and safe and I think this also works for JS.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: RegExp lookbehind

2012-03-18 Thread Erik Corry

2012/3/18 Steven L. steves_l...@hotmail.com:
 Lasse Reichstein wrote:

 I would simply apply same logic we have already for the look ahead ... or
 you think that would cause problems?


 I'm not sure it even makes sense.

 ES RegExps are backtracking based, and it makes a difference in which
 order alternatives are tried. Greedy matching is defined in terms of
 number of repetitions, not length of the match. All of these are
 defined in a way that assumes left-to-right matching.

 Example:
  Take the RegExp  /(?((?:aa|aaa)+))b/  where (? ... ) delimits the
 look-behind.
  and try matching it on the string xab.
  Then tell me how many a's are captured by the capturing group, and why :)

 The most intuitive interpretation would be a reverse implementation
 of the normal matching algorithm, i.e., backwards matching, but that
 would likely duplicate the entire RegExp semantics (or parameterize it
 by a direction).

 Any attempt to use the normal (forward) semantics and then try to find
 an earlier point to start it at is likely to be either flawed or
 effectively unpredictable to users.


 Technically, you're right. They're different. But they can appear exactly
 the same by implementing lookbehind as a zero-length assertion of
 (?:lookbehind)$ matched against the lookbehind's left context, starting from
 the very start of the subject string. Although people thinking about
 implementation might come to think of some other approach as more intuitive,
 from my experience every single plain-old-developer unconcerned about
 implementation thinks of the semantics I just described as intuitive. It is
 also how every single implementation of lookbehind that I know of actually
 works.

 The reason that all major regex flavors except .NET don't support lookbehind
 is because it's inefficient to re-search from the very beginning of an
 arbitrarily long string. That's why they support fixed- or finit-length
 lookbehind only--if they can determine the maximum distance backward they
 need to search forward from, they can step back only that many characters.

No wonder that look-behinds have a reputation for poor performance if
this is how it's done.

 In practice, at least for finite- rather than fixed-length lookbehind, this
 attempt to avoid far-back searches is kind of silly--e.g., Java lets you use
 a quantifier like {0,10} within lookbehind.

At least this provides something to point the user at and say look,
this is why you have bad performance.

 The Right-to-Left Mode that powers .NET's lookbehind is pretty neat. It
 magically follows the plain-old-developer's intuitive expectation while
 working backword rather than from the start of the string. Unfortunately,
 how it actually works is fairly mysterious. Although it works fairly
 reliably, as I previously mentioned it can occasionally be a bit
 buggy/weird.


 And you will probably never achieve that /(re)$/ and /(?(re))$/
 always capture the same substring :)


 Apart from potential bugs, (re)$ and (?=(re))$ capture the same string
 in every implementation of lookbehind that I know of.

Really?  I don't have a .NET implementation handy, but I'll make a
prediction based on the description of its algorithm.  Lets look at
the following.  (replace the '+'s with {1,1000} for the sake of the
non-.NET regexps, I'm not going to type that ugliness here):

/(x(.+?))$/
/(?(x(.+?))$/

Give these regexps the input: x foo x bar.  The first should match
x foo x bar and the second will match x bar as the non-greedy
quantifier will stop when it finds the nearest x from the end.  But in
a regexp engine with the start at the earliest point and search
forwards kludge they will return the same.

Sadly this means that if we settle on the non-.NET way to implement
look-behind it will be user-visible, so that implementations will not
have the option of using the efficient .NET algorithm.  Also we will
never be able to get rid of the limitation on infinite length
lookbehind without losing backwards compatibility.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-18 Thread Erik Corry

2012/3/18 Steven L. steves_l...@hotmail.com:
 Anyway, this is probably all moot, unless someone wants to officially
 propose POSIX character classes for ES RegExp. ...In which case I'll be
 happy to state about a half-dozen reasons to not do so. :)

Please do, they seem quite sensible to me.

In fact \w with Unicode support seems very similar to [:alnum:] to me.
 If this one is useful are there not other Unicode categories that
would be useful?

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-17 Thread Erik Corry

2012/3/17 Steven L. steves_l...@hotmail.com:
 Eric Corry wrote:

 However I think we probably do want the /u modifier on regexps to
 control the new backward-incompatible behaviour.  There may be some
 way to relax this for regexp literals in opted in Harmony code, but
 for new RegExp(...) and for other string literals I think there are
 rather too many inconsistencies with the old behaviour.


 Disagree with adding /u for this purpose and disagree with breaking backward
 compatibility to let `/./.exec(s)[0].length == 2`.

Care to enlighten us with any thinking behind this disagreeing?

 Instead, if this is
 deemed an important enough issue, there are two ways to match any Unicode
 grapheme that match existing regex library precedent:

 From Perl and PCRE:

 \X

This doesn't work inside [].  Were you envisioning the same restriction in JS?

Also it matches a grapheme cluster, which is may be useful but is
completely different to what the dot does.

 From Perl, PCRE, .NET, Java, XML Schema, and ICU (among others):

 \P{M}\p{M}*

 Obviously \X is prettier, but because it's fairly rare for people to care
 about this, IMO the more widely compatible solution that uses Unicode
 categories is Good Enough if Unicode category syntax is on the table for
 ES6.

 Norbert Lindenberg wrote:

 \u[\u-\u] is interpreted as [\u\u-\u\u]

Norbert, this just happens automatically if unmatched surrogates are
just treated as if they were normal code units.

 [\u-\u][\u-\u] is interpreted as
 [\u\u-\u\u]

Norbert, this will have different semantics to the current
implementations unless the second range is the full trail surrogate
range.

I agree with Steven that these two cases should just be left alone,
which means they will continue to work the way they have until now.

 Some people will want a way to match arbitrary Unicode code
 points rather than graphemes anyway, so leaving \u alone lets that use
 case continue working. This would still allow modifying the handling of
 literal astral/supplementary characters in RegExps. If it can be handled
 sensibly, I'm all for treating literal characters in RegExps as discrete
 graphemes rather than splitting them into surrogate pairs.

You seem to be confusing graphemes and unicode code points.  Here is
the same text 3 times:

Four UTF-16 code units:

0x0020 0xD800 0xDF30 0x0308

Three Unicode code points:

0x20 0x10330 0x308

Two Graphemes

  ¨  -- This is an attempt to show a Gothic Ahsa with an umlaut.
My mail program probably screwed it up.

The proposal you are responding to is all about adding Unicode code
point handling to regexps.  It is not about adding grapheme support,
which is a rather different issue.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: 128-bit IEEE DFP

2012-03-17 Thread Erik Corry

2012/3/17 Andrew Paprocki and...@ishiboo.com:
 I see there was a bunch of work done to possibly introduce 128-bit
 IEEE754r DFP back in the ES4 days:

 http://wiki.ecmascript.org/doku.php?id=proposals:decimals=decimal

 Has any work been done since ES4 to introduce a DFP type?

Hopefully, no, but in the mean time, Dart has introduced arbitrary
precision integers.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-17 Thread Erik Corry

2012/3/17 Norbert Lindenberg ecmascr...@norbertlindenberg.com:
 Steven, sorry, I wasn't aware of your proposal for /u when I inserted the 
 note on this flag into my proposal. My proposal was inspired by the use of /u 
 in PHP, where it switches from byte mode to UTF-8 mode. We'll have to see 
 whether it makes sense to combine the two under one flag or use two - 
 fortunately, Unicode still has a few other characters.

/foo/☃   // slash-unicode-snowman for the win! :-)

-- 
Erik Corry

P.S. I shudder to think what slash-pile-of-poo could mean.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-17 Thread Erik Corry

2012/3/17 Steven L. steves_l...@hotmail.com:
 I further objected because I think the /u flag would be better used as a
 ASCII/Unicode mode switcher for \d\w\b. My proposal for this is based on
 Python's re.UNICODE or (?u) flag, which does the same thing except that it
 also covers \s (which is already Unicode-based in ES).

I am rather skeptical about treating \d like this.  I think any digit
including rods and roman characters but not decimal points/commas
http://en.wikipedia.org/wiki/Numerals_in_Unicode#Counting-rod_numerals
would be needed much less often than the digits 0-9, so I think
hijacking \d for this case is poor use of name space.  The \d escape
in perl does not cover other Unicode numerals, and even with the
[:name:] syntax there appears to be no way to get the Unicode
numerals: 
http://search.cpan.org/~flora/perl-5.14.2/pod/perlrecharclass.pod#POSIX_Character_Classes
 This suggests to me that it's not very useful.

And instead of changing the meaning of \w, which will be confusing, I
think that [:alnum:] as in perl would work fine.

\b is a little tougher.  The Unicode rewrite would be
(?:(?![:alnum:])(?=[:alnum:])|(?=[:alnum:])(?![:alnum:])) which is
obviously too verbose.  But if we take \b for this then the ASCII
version has to be written as
(?:(?!\w)(?=\w)|(?=\w)(?!\w)) which is also more than a little
annoying.  However, often you don't need that if you have negative
lookbehind because you can write something
like

/(?!\w)word(?=!\w)/// Negative look-behind for a \w and negative
look-ahead for \w at the end.

which isn't _too_ bad, even if it is much worse than

/\bword\b/

 Indeed. My response was rushed and poorly formed. My apologies.

Gratefully accepted with the hope that my next rushed and poorly
formed response will also be forgiven!

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-17 Thread Erik Corry

 the performance will be much different between a
lookbehind and a \b though.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-16 Thread Erik Corry

This is very useful, and was surely a lot of work.  I like the general
thrust of it a lot.  It has a high level of backwards compatibility,
does not rely on the VM having two different string implementations in
it, and it seems to fix the issues people are encountering.

However I think we probably do want the /u modifier on regexps to
control the new backward-incompatible behaviour.  There may be some
way to relax this for regexp literals in opted in Harmony code, but
for new RegExp(...) and for other string literals I think there are
rather too many inconsistencies with the old behaviour.

The algorithm given for codePointAt never returns NaN.  It should
probably do that for indices that hit a trail surrogate that has a
lead surrogate preceeding it.

Perhaps it is outside the scope of this proposal, but it would also
make a lot of sense to add some named character classes to RegExp.

If we are makig a /u modifier for RegExp it would also be nice to get
rid of the incorrect case independent matching rules.  This is the
section that says: If ch's code unit value is greater than or equal
to decimal 128 and cu's code unit value is less than decimal  128,
then return ch.

2012/3/16 Norbert Lindenberg ecmascr...@norbertlindenberg.com:
 Based on my prioritization of goals for support for full Unicode in 
 ECMAScript [1], I've put together a proposal for supporting the full Unicode 
 character set based on the existing representation of text in ECMAScript 
 using UTF-16 code unit sequences:
 http://norbertlindenberg.com/2012/03/ecmascript-supplementary-characters/index.html

 The detailed proposed spec changes serve to get a good idea of the scope of 
 the changes, but will need some polishing.

 Comments?

 Thanks,
 Norbert

 [1] https://mail.mozilla.org/pipermail/es-discuss/2012-February/020721.html

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-16 Thread Erik Corry

2012/3/17 Norbert Lindenberg ecmascr...@norbertlindenberg.com:
 Thanks for your comments - a few replies below.

 Norbert


 On Mar 16, 2012, at 1:55 , Erik Corry wrote:

 However I think we probably do want the /u modifier on regexps to
 control the new backward-incompatible behaviour.  There may be some
 way to relax this for regexp literals in opted in Harmony code, but
 for new RegExp(...) and for other string literals I think there are
 rather too many inconsistencies with the old behaviour.

 Before asking developers to add /u, we should really have some evidence that 
 not doing so would cause actual compatibility issues for real applications. 
 Do you know of any examples?

No.  In general I don't think it is realistic to try to prove that
problematic code does not exist, since that requires quantifying over
all existing JS code, which is clearly impossible.

 Good point about Harmony code, although it seems opt-in got replaced by being 
 part of a module.

That would work too, I think.

 The algorithm given for codePointAt never returns NaN.  It should
 probably do that for indices that hit a trail surrogate that has a
 lead surrogate preceeding it.

 NaN is not a valid code point, so it shouldn't be returned. If we want to 
 indicate access to a trailing surrogate code unit as an error, we should 
 throw an exception.

Then you should probably remove the text: If there is no code unit at
that position, the result is NaN from your proposal :-)

I am wary of using exceptions for non-exceptional data-driven events,
since performance is usually terrible and it's arguably an abuse of
the mechanism.  Your iterator code looks fine to me an needs neither
NaN or exceptions.

 Perhaps it is outside the scope of this proposal, but it would also
 make a lot of sense to add some named character classes to RegExp.

 It would make a lot of sense, but is outside the scope of this proposal. One 
 step at a time :-)

I can see that.  But if we are going to have multiple versions of the
RegExp syntax we should probably aim to keep the number down.

 If we are makig a /u modifier for RegExp it would also be nice to get
 rid of the incorrect case independent matching rules.  This is the
 section that says: If ch's code unit value is greater than or equal
 to decimal 128 and cu's code unit value is less than decimal  128,
 then return ch.

 And the exception for ß and other characters whose upper case equivalent 
 has more than one code point (If u does not consist of a single character, 
 return ch. in the Canonicalize algorithm in ES 5.1).

Yes.


 2012/3/16 Norbert Lindenberg ecmascr...@norbertlindenberg.com:
 Based on my prioritization of goals for support for full Unicode in 
 ECMAScript [1], I've put together a proposal for supporting the full 
 Unicode character set based on the existing representation of text in 
 ECMAScript using UTF-16 code unit sequences:
 http://norbertlindenberg.com/2012/03/ecmascript-supplementary-characters/index.html

 The detailed proposed spec changes serve to get a good idea of the scope of 
 the changes, but will need some polishing.

 Comments?

 Thanks,
 Norbert

 [1] https://mail.mozilla.org/pipermail/es-discuss/2012-February/020721.html

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: New full Unicode for ES6 idea

2012-03-02 Thread Erik Corry

2012/3/2 Glenn Adams gl...@skynav.com:

 On Fri, Mar 2, 2012 at 12:58 AM, Erik Corry erik.co...@gmail.com wrote:

 2012/3/1 Glenn Adams gl...@skynav.com:
  I'd like to plead for a solution rather like the one Java has, where
  strings are sequences of UTF-16 codes and there are specialized ways
  to iterate over them.  Looking at this entry from the Unicode FAQ:
  http://unicode.org/faq/char_combmark.html#7 there are different ways
  to describe the length (and iteration) of a string.  The BRS proposal
  favours #2, but I think for most applications utf-16-based-#1 is just
  fine, and for the applications that want to do it right #3 is almost
  always the correct solution.  Solution #3 needs library support in any
  case and has no problems with UTF-16.
 
  The central point here is that there are combining characters
  (accents) that you can't just normalize away.  Getting them right has
  a lot of the same issues as surrogate pairs (you shouldn't normally
  chop them up, they count as one 'character', you can't tell how many
  of them there are in a string without looking, etc.).  If you can
  handle combining characters then the surrogate pair support falls out
  pretty much for free.
 
 
  The problem here is that you are mixing apples and oranges. Although it
  *may* appear that surrogate pairs and grapheme clusters have features in
  common, they operate at different semantic levels entirely. A solution
  that
  attempts to conflate these two levels is going to cause problems at both
  levels. A distinction should be maintained between the following levels:
 
  (1) encoding units (e.g., UTF-16 coding units)
  (2) unicode scalar values (code points)
  (3) grapheme clusters

 This distinction is not lost on me.  I propose that random access
 indexing and .length in JS should work on level 1,


 that's where we are today: indexing and length based on 16-bit code units
 (of a UTF-16 encoding, likewise with Java)

Not really for JS.  Missing parts in the current UTF-16 support have
been listed in this thread, eg in Norbert Lindenberg's 6 point
prioritization list, which I replied to yesterday.

 and there should be
 library support for levels 2 and 3.  In order of descending usefulness
 I think the order is 1, 3, 2.  Therefore I don't want to cause a lot
 of backwards compatibility headaches by prioritizing the efficient
 handling of level 2.


 from a perspective of indexing Unicode characters, level 2 is the correct
 place;

Yes, by definition.

 level 3 is useful for higher level, language/locale sensitive text

No, the Unicode grapheme clustering algorithm is not locale or
language sensitive
http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

 processing, but not particularly interesting at the basic ES string
 processing level; we aren't talking about (or IMO should not be talking
 about) a level 3 text processing library in this thread;

I will continue to feel free to talk about it as I believe that in the
cases where just indexing by UTF-16 words is not sufficient it is
normally level 3 that is the correct level.  Also, I think there
should be support for this level in JS as it is not locale-dependent.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: New full Unicode for ES6 idea

2012-03-01 Thread Erik Corry

I'm not in favour of big red switches, and I don't think the
compartment based solution is going to be workable.

I'd like to plead for a solution rather like the one Java has, where
strings are sequences of UTF-16 codes and there are specialized ways
to iterate over them.  Looking at this entry from the Unicode FAQ:
http://unicode.org/faq/char_combmark.html#7 there are different ways
to describe the length (and iteration) of a string.  The BRS proposal
favours #2, but I think for most applications utf-16-based-#1 is just
fine, and for the applications that want to do it right #3 is almost
always the correct solution.  Solution #3 needs library support in any
case and has no problems with UTF-16.

The central point here is that there are combining characters
(accents) that you can't just normalize away.  Getting them right has
a lot of the same issues as surrogate pairs (you shouldn't normally
chop them up, they count as one 'character', you can't tell how many
of them there are in a string without looking, etc.).  If you can
handle combining characters then the surrogate pair support falls out
pretty much for free.

Advantages of my proposal:

* High level of backwards compatibility
* No issues of where to place the BRS
* Compact and simple in the implementation
* Can be polyfilled on most VMs
* Interaction with the DOM is unproblematic
* No issues of what happens on concatenation if a surrogate pair is created.

Details:

* The built in string charCodeAt, [], length operations work in terms of UTF-16
* String.fromCharCode(x) can return a string with a length of 2
* New object StringIterator

new StringIterator(backing) returns a string iterator.  The iterator
has the following methods:

hasNext();  // Returns this.index() != this.backing().length
nextGrapheme();  // Returns the next grapheme as a unicode code point,
or -1 if the next grapheme is a sequence of code points
nextGraphemeArray(); // Returns an array of numeric code points
(possibly just one) representing the next grapheme
nextCodePoint(); // Returns the next code point, possibly consuming
two surrogate pairs
index();  // Gets the current index in the string, from 0 to length
setIndex();  // Sets the current index in the string, from 0 to length
backing();  // Get the backing string

// Optionally
hasPrevious();
previous*();  // Analogous to nextGrapheme etc.
codePointLength(); // Takes O(length), cache the answer if you care
graphemeLength();  // Ditto

If any of the next.. functions encounter an unmatched half of a
surrogate pair they just return its number.

Regexp support.  Regexps act 'as if' the following steps were performed.

Outside character classes an extended character turns into (?:xy)
where x and y are the surrogate pairs.
Inside positive character classes the extended characters are
extracted so [abz] becomes (?:[ab]|xy) where z is an extended
character and x and y are the surrogate pairs.
Negative character classes can be handled by transforming into
negative lookaheads.
A decent set of unicode character classes will likely subsume most
uses of these transformations.

Perhaps the BRS 21 bit solution feels marginally cleaner, but having
two different kinds of strings in the same VM feels like a horrible
solution that is user visible and will haunt implementations forever,
and the cleanliness difference is very marginal given that grapheme
based iteration is the correct solution for almost all the cases where
iterating over utf-16 codes is not good enough.

-- 
Erik Corry

2012/2/20 Phillips, Addison addi...@lab126.com:
 Mark wrote:



 First, it would be great to get full Unicode support in JS. I know that's
 been a problem for us at Google.



 AP +1: I think we’ve waited for supplementary character support long
 enough!



 Secondly, while I agree with Addison that the approach that Java took is
 workable, it does cause problems.



 AP The tension is between “compatibility” and “ease of use” here, I think.
 The question is whether very many scripts depend on the ‘uint16’ nature of a
 character in ES, use surrogates to effect supplementary character support,
 or are otherwise tied to the existing encoding model and are broken as a
 result of changes. In its ideal form, an ES string would logically be a
 sequence of Unicode characters (code points) and only the internal
 representation would worry about whatever character encoding scheme made the
 most sense (in many cases, this might actually be UTF-16).



 AP … but what I think is hard to deal with are different modes of
 processing scripts depending on “fullness of the Unicode inside”.
 Admittedly, the approach I favor is rather conservative and presents a
 number of challenges, most notably in adapting regex or for users who want
 to work strictly in terms of character values.



 There are good reasons for why Java did what it did, basically for
 compatibility. But if there is some way that JS can work around those,
 that'd be great.



 AP Yes, it would.



 ~Addison

Re: New full Unicode for ES6 idea

2012-03-01 Thread Erik Corry

2012/2/22 Norbert Lindenberg ecmascr...@norbertlindenberg.com:
I'll reply to Brendan's proposal in two parts: first about the goals for
supplementary character support, second about the BRS.

Full 21-bit Unicode support means all of:

* indexing by characters, not uint16 storage units;
* counting length as one greater than the last index; and
* supporting escapes with (up to) six hexadecimal digits.

For me, full 21-bit Unicode support has a different priority list.

First come the essentials: Regular expressions; functions that interpret
strings; the overall sense that all Unicode characters are supported.

1) Regular expressions must recognize supplementary characters as atomic
entities, and interpret them according to Unicode semantics.

Look at the contortions one has to go through currently to describe a simple
character class that includes supplementary characters:
https://github.com/roozbehp/yui3-gallery/blob/master/src/gallery-intl-bidi/js/intl-bidi.js

Read up on why it has to be done this way, and see to what extremes some
people are going to make supplementary characters work despite ECMAScript:
http://inimino.org/~inimino/blog/javascript_cset

Now, try to figure out how you'd convert a user-entered string to a regular
expression such that you can search for the string without case distinction,
where the string may contain supplementary characters such as жвь (Deseret
for one).

Regular expressions matter a lot here because, if done properly, they
eliminate much of the need for iterating over strings manually.

2) Built-in functions that interpret strings have to recognize supplementary
characters as atomic entities and interpret them according to their Unicode
semantics. The list of functions in ES5 that violate this principle is
actually rather short: Besides the String functions relying on regular
expressions (match, replace, search, split), they're the String case
conversion functions (toLowerCase, toLocaleLowerCase, toUpperCase,
toLocaleUpperCase) and the relational comparison for strings (11.8.5). But
the principle is also important for new functionality being considered for
ES6 and above.

3) It must be clear that the full Unicode character set is allowed and
supported. This means at least getting rid of the reference to UCS-2 (clause
2) and the bizarre equivalence between characters and UTF-16 code units
(clause 6). ECMAScript has already defined several ways to create UTF-16
strings containing supplementary characters (parsing UTF-8 source; using
Unicode escapes for surrogate pairs), and lets applications freely pass
around such strings. Browsers have surrounded ECMAScript implementations with
text input, text rendering, DOM APIs, and XMLHTTPRequest with full Unicode
support, and generally use full UTF-16 to exchange text with their ECMAScript
subsystem. Developers have used this to build applications that support
supplementary characters, hacking around the remaining gaps in ECMAScript as
seen above. But, as in the bug report that Brendan pointed to this morning
(http://code.google.com/p/v8/issues/detail?id=761), the mention of UCS-2 is
still used by some to excuse bugs.

I agree that these are the priorities and should be done, including
reopening and fixing the V8 bug.

Only after these essentials come the niceties of String representation and
Unicode escapes:

4) 1 String element to 1 Unicode code point is indeed a very nice and
desirable relationship. Unlike Java, where binary compatibility between
virtual machines made a change from UTF-16 to UTF-32 impossible, JavaScript
needs to be compatible only at the source code level - or maybe, with a BRS,
not even that.

I don't think this is important enough to justify
incompatibility/implementation pain.

Agree with your points 5 and 6. One extra point of my own:

* I think we should prefer transparency in cases where there is doubt.
This means passing data through with no errors or changes. It means
allowing half surrogate pairs, combining characters that have nothing
to combine with and characters that are not currently assigned in
Unicode. In an ideal world, it's hard to see why these happen, but in
the cases where they happen the most helpful thing to do is almost
always to allow/ignore them.

Here are two hyptothetical examples:

We get data from a source that chops up a UTF-16 text into chunks and
sends them separately for transmission. This will result in unmatched
pairs of surrogates, but as long as our applications transmits the
data unchanged, no harm results after they are recombined later.

Take an XML format where all the tags are ASCII, but there is body
text that contains floating point numbers encoded as 16 bit values,
including malformed surrogate pairs. This is pretty sick, but who are
we to judge? We want to treat this as a string because we can use
string operations on the XML tags, but it would be extremely unhelpful
to

Re: New full Unicode for ES6 idea

2012-03-01 Thread Erik Corry

2012/3/1 Glenn Adams gl...@skynav.com:

 2012/3/1 Erik Corry erik.co...@gmail.com

 I'm not in favour of big red switches, and I don't think the
 compartment based solution is going to be workable.

 I'd like to plead for a solution rather like the one Java has, where
 strings are sequences of UTF-16 codes and there are specialized ways
 to iterate over them.  Looking at this entry from the Unicode FAQ:
 http://unicode.org/faq/char_combmark.html#7 there are different ways
 to describe the length (and iteration) of a string.  The BRS proposal
 favours #2, but I think for most applications utf-16-based-#1 is just
 fine, and for the applications that want to do it right #3 is almost
 always the correct solution.  Solution #3 needs library support in any
 case and has no problems with UTF-16.

 The central point here is that there are combining characters
 (accents) that you can't just normalize away.  Getting them right has
 a lot of the same issues as surrogate pairs (you shouldn't normally
 chop them up, they count as one 'character', you can't tell how many
 of them there are in a string without looking, etc.).  If you can
 handle combining characters then the surrogate pair support falls out
 pretty much for free.


 The problem here is that you are mixing apples and oranges. Although it
 *may* appear that surrogate pairs and grapheme clusters have features in
 common, they operate at different semantic levels entirely. A solution that
 attempts to conflate these two levels is going to cause problems at both
 levels. A distinction should be maintained between the following levels:

 encoding units (e.g., UTF-16 coding units)
 unicode scalar values (code points)
 grapheme clusters

This distinction is not lost on me.  I propose that random access
indexing and .length in JS should work on level 1, and there should be
library support for levels 2 and 3.  In order of descending usefulness
I think the order is 1, 3, 2.  Therefore I don't want to cause a lot
of backwards compatibility headaches by prioritizing the efficient
handling of level 2.


 IMO, the current discussion should limit itself to the interface between the
 first and second of these levels, and not introduce the third level into the
 mix.

 G.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: [friam] Fwd: Hash Collision Denial of Service

2012-01-06 Thread Erik Corry

For hash maps with string keys, people can concatenate the string keys
with a random prefix. This fixes this attack, and also prevents the
attacker from using annoying keys like __proto__, hasOwnProperty or
toString. It doesn't fix things for JSON though, if you are reading
untrusted (in the DOS sense) JSON.

While V8 is fixing this DOS attack, I am not entirely happy about that
because it sends a signal that it is a good idea to use non-prefixed
property strings on objects as hash maps. The issues around that are
often much worse than a CPU eating DOS that only really hurts when you
have more than 10k keys. See for example
https://groups.google.com/a/googleproductforums.com/forum/#!category-topic/docs/documents/0hQWeOvCcHU

2012/1/6 Mark S. Miller erig...@google.com:
There is currently an informal (partial?) consensus to try to add high
entropy identity hashes to ES6 (but no proposal page yet), so that users can
build hashtables for themselves. Were they to do so, they immediately find
they'd want to include non-objects as keys as well (like Map does), and so
we might be tempted to expose a stable data hashing function to support such
uses. The following surprised me, even though it was apparently well known
(not by me ;)) since 2003.

from https://groups.google.com/forum/#!topic/friam/jKRZrb5bQEA:

Forwarded conversation
Subject: [friam] Fwd: Hash Collision Denial of Service

From: Bill Frantz fra...@pwpconsult.com
Date: Thu, Jan 5, 2012 at 11:51 AM
To: Design fr...@googlegroups.com

From: @RISK: The Consensus Security Vulnerability Alert Week 1 2012

== Forwarded Message ==
Date: 1/5/12 19:37
From: consensussecurityvulnerabilityal...@sans.org (The SANS Institute)

12.2.5 CVE: Not Available
Platform: Cross Platform
Title: Java Hash Collision Denial of Service
Description: Java is a programming language. The application is
exposed to a denial of service issue due to an
error during hashing form posts and updating a hash table. Specially
crafted forms in HTTP POST requests can trigger hash collisions
resulting in high CPU consumption. Java 7 and prior are affected.
Ref: http://www.ocert.org/advisories/ocert-2011-003.html
http://www.securityfocus.com/bid/51236/references
__

12.2.6 CVE: Not Available
Platform: Cross Platform
Title: Python Hash Collision Denial of Service
Description: Python is a programming language available for multiple
platforms. The application is exposed to a denial of service issue
due to an error during hashing form posts and updating a hash table.
Specially crafted forms in HTTP POST requests
can trigger hash collisions resulting in high CPU consumption.
All versions of Python are affected.
Ref: http://www.securityfocus.com/bid/51239/references
__
== End Forwarded Message ==

It seems to me, short of using secure hashes, any use of hashtables is
subject to this attack if the attacker can control the data being hashed.

Cheers - Bill

---
Bill Frantz |We used to quip that password is the most common
408-356-8506 | password. Now it's 'password1.' Who said users haven't
www.periwinkle.com | learned anything about security? -- Bruce Schneier

--
You received this message because you are subscribed to the Google Groups
friam group.
To post to this group, send email to fr...@googlegroups.com.
To unsubscribe from this group, send email to
friam+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/friam?hl=en.

--
From: Brian Warner war...@lothar.com
Date: Thu, Jan 5, 2012 at 12:09 PM
To: fr...@googlegroups.com

Given the limited number of output buckets, I don't think a secure hash
would win you much (i.e. there are no secure 10-bit hashes). Instead, I
think you want to mix things up a bit, by including a per-runtime random
secret in the hash calculation (generated each time the program starts,
maybe for each dictionary you allocate). And then hope that you don't
expose enough information to the attacker (perhaps by enumerating
dictionaries in implementation-defined order without sorting the keys)
to let them deduce the secret, and thus be able to force a lot of
collisions.

I was re-reading djb/agl's articles on crit-bit trees (aka PATRICIA
trees, or tries, for those in the router world), and making the argument
that programming languages should use a crit-bit tree as their
fundamental data structure rather than a hash-table -based dictionary
(because you get some additional operations for cheap, like sorted
enumeration). I'm not sure if this would be any less vulnerable to
attack.. seems like a series of [1,11,111,,1,..] keys would
cause similar problems.

http://cr.yp.to/critbit.html

Re: Alternative proposal to privateName.public

2011-12-26 Thread Erik Corry

2011/12/22 Tom Van Cutsem tomvc...@gmail.com:
 At first, I shared Andreas's concern about introducing a flag in feature X
 that only seems to affect a superficially unrelated feature Y.

 However, having skimmed the private names page, I stumbled upon a section
 that seems to want to introduce precisely such a flag for different but
 related purposes:
 http://wiki.ecmascript.org/doku.php?id=harmony:private_name_objects#possible_visibility_flag

 Having also just read about the different use cases of private names
 versus just unique names, it would make a lot of sense to me if we would
 separate these two (either via a flag or via different constructors):

 - private names: invisible to for..in, Object.getOwnPropertyNames, and even
 proxies
 - unique names: fully visible to for..in, Object.getOwnPropertyNames, and
 proxies


I don't see how you need anything new in the language to support unique names.


var newUniqueName = (function() {
  var counter = 0;
  return function () {
return __uniquename__ + counter++;
  };
})();


var MyClass = (function() {
  var private1 = newUniqueName();
  var private2 = newUniqueName();
  return function() {
this.my_public_variable = 0;
this[private1] = 1;
this[private2] = 2;
  }
})();

var my_instance = new MyClass();

I'm a big fan of building what we need out of what we have rather than
adding more and more to the language.  Benefits include:

* Keeps VM complexity down
* Available now on all browsers
* If it turns out to be dumb we are not stuck with it for all eternity


 By tying the flag to use cases of private names (are you interested in the
 name's privacy or its uniqueness?), it makes more sense to include it in the
 API.

 Cheers,
 Tom

 2011/12/20 David Bruant bruan...@gmail.com

 Le 17/12/2011 14:24, David Bruant a écrit :
  (...)
 
  # Proposal
 
  What about a 'trapped' boolean in the private name constructor?
 
  It would work this way:
 
  `JavaScript
  var n = new Name(false); // don't trap when used is a proxy
  var p = new Proxy({}, maliciousHandler);
 
  p[n] = 21; // the set trap is NOT called!
  var v = p[n]; // the get trap is NOT called and v === 21
  `
 
  Basically, when the private name is created with the 'trapped' boolean
  to false, the proxy becomes oblivious to being called with these private
  names.
 There has been some other proposals suggesting ways to bypass the proxy
 to work directly on the target. Since I have been brief, my proposal
 could be interepreted as such and it was not my intention. So here are
 additional code snippets to further explain my proposal.
 -
 var n = new Name(false); // untrapped name
 var t = {};

 var p = new Proxy(t, maliciousHandler);

 p[n] = 21; // the malicious set trap is NOT called!
 var v = p[n]; // the malicious get trap is NOT called and v === 21

 Object.getOwnPropertyDescriptor(t, n); // undefined
 -

 In this proposal, in the case of untrapped names, only the proxy
 identity is used internally. No trap is called and the target remains
 untouched.
 There is neither implicit nor explicit forwarding to the target. If the
 code in possession of both a reference to the private name and a
 suspicious object does not want the suspicious object to have to do
 anything with the name, it can define the private name as untrapped and
 the proxy will be oblivious to the private name.

 This choice is made in order to make the private name owners responsible
 for what they do with the private name, choose who they want the name to
 be shared with.

 David
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss



 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Alternative proposal to privateName.public

2011-12-26 Thread Erik Corry

2011/12/26 David Bruant bruan...@gmail.com:
 Le 26/12/2011 15:56, Erik Corry a écrit :
 2011/12/22 Tom Van Cutsem tomvc...@gmail.com:
 At first, I shared Andreas's concern about introducing a flag in feature X
 that only seems to affect a superficially unrelated feature Y.

 However, having skimmed the private names page, I stumbled upon a section
 that seems to want to introduce precisely such a flag for different but
 related purposes:
 http://wiki.ecmascript.org/doku.php?id=harmony:private_name_objects#possible_visibility_flag

 Having also just read about the different use cases of private names
 versus just unique names, it would make a lot of sense to me if we would
 separate these two (either via a flag or via different constructors):

 - private names: invisible to for..in, Object.getOwnPropertyNames, and even
 proxies
 - unique names: fully visible to for..in, Object.getOwnPropertyNames, and
 proxies

 I don't see how you need anything new in the language to support unique 
 names.


 var newUniqueName = (function() {
   var counter = 0;
   return function () {
     return __uniquename__ + counter++;
   };
 })();
 I think that in the proposal, the definition of unique is unique
 across the program.
 And this can't be achieved in JavaScript since no program can know, at a
 given time, which names are used and which are not. It also cannot know
 which names will be generated (this last part is undecidable anyway).

This can be fixed by convention.  As long as there is only one
uniqueName function then the names it makes will be unique.  To ensure
there is only one it can be installed like so:

if (!Object.newUniqueName) Object.newUniqueName = (...  // See above.

The __uniqueName__ string above can be replaced with something like
__i_have_read_and_abide_by_the_unique_name_convention__

If that seems onerous to you consider whether it is really harder than
modifying the VM and then waiting 10 years for your change to be
universally available.

 If it was considered to introduce a module that contained a property
 name in order to access ES5.1 [[Class]], this name could not be a string
 since it could conflict with other names used on this object. However, a
 unique name could be used.
 -
 import className from @class;

 var o = {'class': 1, 'className':2, '[[Class]]':3};
 var a = o | [];

 o[className]; // Object
 a[className]; // Array
 -

 Despite my effort to make a collision in o definition, I can safely
 retrieve the internal [[Class]] of an object since the unique name
 stored in the className variable is guaranteed to be unique.

 This will also allow implementors to experiment whatever they want
 without polluting objects, nor doing weird string-based property-like
 non-properties like '__proto__', '__noSuchMethod__'.

 I don't know for [[class]], but iterators [1] already propose such a thing.

 David

 [1] http://wiki.ecmascript.org/doku.php?id=harmony:iterators
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Have the scope accessible as an object and implement a new with operator / block

2011-12-20 Thread Erik Corry

2011/12/20 Xavier MONTILLET xavierm02@gmail.com:
 Hi,

 There is this one thing in JavaScript I really hate: You can't so
 {{$varName}} as in PHP so as soon as you need dynamic names, you have
 to put everything in an object...

 I don't know how scopes are implemented

Not being able to get hold of the scope as an object enables important
optimizations and is a feature, not a bug.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Math: n-th root, logarithm with arbitrary base

2011-12-19 Thread Erik Corry

2011/12/19 Axel Rauschmayer a...@rauschma.de:
 But wouldn’t that rather be a reason for making these functions part of the
 core language?

MS have proposed log2, log10, log1p.  This makes more sense to me.
These already exist and are useful in other languages and are not too
hard to implement.

If we go beyond that it would be nice to see a sample implementation
that is not either trivial or inaccurate or both?

 On Dec 19, 2011, at 8:48 , Erik Corry wrote:

 Both the proposed implementations do fp rounding twice, and so produce
 an inaccurate answer.  I think we should probably leave it to the user
 to define incorrect math functions, rather than bake them into the
 language.

 I haven’t seen these two functions among the proposed additions for Math
 (should these be in a math module?):


    function nthRoot(n, x) {

        return Math.pow(x, 1/n);

    }


    function log_b(b, x) {

        return Math.log(x) / Math.log(b);

    }


 Have they been considered and rejected?


 --
 Dr. Axel Rauschmayer
 a...@rauschma.de

 home: rauschma.de
 twitter: twitter.com/rauschma
 blog: 2ality.com



___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Math: n-th root, logarithm with arbitrary base

2011-12-18 Thread Erik Corry

Both the proposed implementations do fp rounding twice, and so produce
an inaccurate answer.  I think we should probably leave it to the user
to define incorrect math functions, rather than bake them into the
language.

2011/12/19 Axel Rauschmayer a...@rauschma.de:
 I haven’t seen these two functions among the proposed additions for Math 
 (should these be in a math module?):

    function nthRoot(n, x) {
        return Math.pow(x, 1/n);
    }

    function log_b(b, x) {
        return Math.log(x) / Math.log(b);
    }

 Have they been considered and rejected?

 Axel

 --
 Dr. Axel Rauschmayer
 a...@rauschma.de

 home: rauschma.de
 twitter: twitter.com/rauschma
 blog: 2ality.com



 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Minimalist Classes

2011-11-02 Thread Erik Corry

2011/11/2 Quildreen Motta quildr...@gmail.com:
 I don't think hard coding the name of the super-constructor is a
 problem.

 It is when you take into account that functions in JavaScript are not bound
 to an object, they are generic. You can simply assign any function to any
 object and it'll most likely just work.

I think the chances are slim that you can take a function that does a
super call, put it on a different object, and it will 'just work'.
It's a pretty rare case.

troll C++ requires you to state the name of the super-class in super
calls, and Java doesn't.  Do we want to be like Java? /troll

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Private Names in 'text/javascript'

2011-05-18 Thread Erik Corry

2011/5/18 Luke Hoban lu...@microsoft.com:
 The Private Names strawman currently combines a new runtime capability
 (using both strings and private names as keys in objects) with several new
 syntactic constructs (private binding declarations, #.id).  At the March
 meeting, I recall there was some support for the idea of separating these
 two aspects, and exposing the runtime capability also as a library that
 could be used in ‘text/javascript’.


I'd like to support this idea.

Your example looks a bit wrong to me though.  It seems to make new
private names per object.  Surely we want a private name per 'class'
instead.

var Point = (function() {
  var _x = Object.createPrivateName();
  var _y = Object.createPrivateName();

  var Point = function(x, y) {
this[_x] = x;
this[_y] = y;
  }

  Point.prototype.myMethodUsing_xAnd_y = function() {...}

  return Point;
})()


var p = new Point(1, 2);

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Private Names in 'text/javascript'

2011-05-18 Thread Erik Corry

2011/5/18 Andreas Rossberg rossb...@google.com:
 Separating out the functionality of abstract names certainly is a good idea.

 But is there any reason to make it a method of Object? In essence,
 private names form a new primitive type, so there should be a separate
 global object or module for them. Assuming for a minute it was called
 Name (which clearly is a suboptimal choice), then you'd rather invoke
 Name.create(), or perhaps simply Name() (by analogy with calling
 String(v) to create primitive strings, although I'm not sure I like
 the notational abuse behind it).

It would be nice to do Name(foo) or PrivateName(foo) so that the
debugger has some name to use when displaying objects with private
fields.  Alternatively, I suppose the debugger could guess the name
like it does currently for Foo.prototype.myFunc = function()  This
has the disadvantage that a library that uses unique strings to
simulate some of the features of private names on older browsers would
not be able to give the unique names a meaningful prefix.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-16 Thread Erik Corry

2011/5/16 Brendan Eich bren...@mozilla.com:
 This is a good point too. Not sure we've considered a value - value map 
 carefully yet.

A value-anything map is pretty easy to do with a normal JS object.

function get(value) {
  if (typeof(value) == 'number') return this[NUM + value];
  if (typeof(value) == 'string') return this[STR + value];
  ...
}

Weakness makes no sense in this case since a value can never really be lost.



 /be

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-16 Thread Erik Corry

2011/5/16 Brendan Eich bren...@mozilla.com:
 On May 15, 2011, at 3:48 PM, Oliver Hunt wrote:

 On May 15, 2011, at 3:47 PM, Boris Zbarsky wrote:

 On 5/15/11 2:20 PM, Rick Waldron wrote:
 Thanks Brendan, I was looking for something that was representative of
 Boris's use-case

 A typical example is an extension wanting to associate some state with a 
 DOM element or a Window without polluting the DOM.  For example, Adblock 
 Plus wants to store some state per-element so that it knows when it's in 
 the middle of unblocking something so it'll allow the load through for that 
 one thing only.  Firebug wants to store some state per-window (e.g. script 
 bodies, etc) and discard it when the window goes away.
 Which is a use case private names would achieve much more succinctly than 
 weakmaps.

 Not if the object is frozen.

That shouldn't prevent you adding private names.  See earlier message
in this thread.

 Unless you then overspecify private names *as* weak maps, in which case we 
 are going in circles :-P.

Allowing private names on frozen objects doesn't imply the GC
semantics of weak maps so there is still a very important difference.

 /be

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-16 Thread Erik Corry

2011/5/16 Brendan Eich bren...@mozilla.com:
 On May 15, 2011, at 11:55 PM, Erik Corry wrote:

 2011/5/16 Brendan Eich bren...@mozilla.com:
 On May 15, 2011, at 3:48 PM, Oliver Hunt wrote:

 On May 15, 2011, at 3:47 PM, Boris Zbarsky wrote:

 On 5/15/11 2:20 PM, Rick Waldron wrote:
 Thanks Brendan, I was looking for something that was representative of
 Boris's use-case

 A typical example is an extension wanting to associate some state with a 
 DOM element or a Window without polluting the DOM.  For example, Adblock 
 Plus wants to store some state per-element so that it knows when it's in 
 the middle of unblocking something so it'll allow the load through for 
 that one thing only.  Firebug wants to store some state per-window (e.g. 
 script bodies, etc) and discard it when the window goes away.
 Which is a use case private names would achieve much more succinctly than 
 weakmaps.

 Not if the object is frozen.

 That shouldn't prevent you adding private names.  See earlier message
 in this thread.

 Now we are going in circles. Let's try to use words already defined in ES5 
 and model things in observably distinguishable ways, or what's the point?

Are you saying the way that private names work is off limits because
the decision has already been made?

I'd appreciate if you responded to my earlier mail in this thread
where I explained the thinking behind allowing private names on frozen
objects, rather than just saying it's impossible by definition.   In
addition to the reasoning there I will add that if it is possible (and
perhaps it isn't) to satisfy the important use cases of WeakMaps with
a modified private names proposal then we can reduce two new features
to one which AFAICS is a clear win in the
keeping-complexity-manageable effort.


 Frozen means [[Extensible]] is false, so you can't add any properties, I 
 don't care how they are named. What you are proposing here is not observably 
 different from soft fields via weak maps. That is one way to implement 
 private names but it isn't what we have been calling private names, and it 
 does not help the discussion to assume one conclusion.

 Further, some controversy remains around reflection. If any party (one with 
 access to the private name) can reflect on a private-named extension to a 
 frozen object, the objejct was therefore not really frozen, rather extensible.


 Unless you then overspecify private names *as* weak maps, in which case we 
 are going in circles :-P.

 Allowing private names on frozen objects doesn't imply the GC
 semantics of weak maps so there is still a very important difference.

 Not observable.

The GC properties of weak maps are hard to describe in the standard,
but they are important to the users and the implementers so I hope
they are not off limits for the discussion.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-16 Thread Erik Corry

2011/5/16 Brendan Eich bren...@mozilla.com:
 On May 15, 2011, at 11:53 PM, Erik Corry wrote:

 2011/5/16 Brendan Eich bren...@mozilla.com:
 This is a good point too. Not sure we've considered a value - value map 
 carefully yet.

 A value-anything map is pretty easy to do with a normal JS object.

 function get(value) {
  if (typeof(value) == 'number') return this[NUM + value];
  if (typeof(value) == 'string') return this[STR + value];
  ...
 }

 Where's the 'object' case (excluding null)?

Elided for clarity :-)  It can be implemented with private names or
WeakMaps.  My point was we don't need to think about maps with values
as keys.  We have that already.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-16 Thread Erik Corry

2011/5/15 Brendan Eich bren...@mozilla.com:
 Besides attaching metadata, weak maps are important for remembering the
 wrapper or membrane for a given (frozen or not, built-in or host, not to
 be mutated) object identity. Mark and Andreas knows too well, so I'm
 preaching to es-discuss in the To: line. This is not a use-case for weak
 references.

Is there an extra 'not' in this sentence?  weak maps are important
[...] this is not a use-case for weak references

Mark has mentioned membranes as an example of the use of WeakMaps.  I
can see that you don't want the membrane to keep the objects alive,
but is it a problem that the objects keep the membrane alive?  Are we
expecting lots of membranes to come and go and the GC will need to
clean up after them?  I'm not saying it isn't important, I'm just
trying to clarify the use case here.

 Weak maps are useful. I don't think they're misnamed. But we can revisit the
 name if we must.
 /be



 
  For example, Firefox extensions want to do this all the time for various
  DOM objects (and especially Window).
 
  -Boris
 
  ___
  es-discuss mailing list
  es-discuss@mozilla.org
  https://mail.mozilla.org/listinfo/es-discuss

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss




 --
     Cheers,
     --MarkM
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-16 Thread Erik Corry

2011/5/16 Brendan Eich bren...@mozilla.com:
 On May 16, 2011, at 12:01 AM, Brendan Eich wrote:

 On May 15, 2011, at 11:55 PM, Erik Corry wrote:

 2011/5/16 Brendan Eich bren...@mozilla.com:
 Not if the object is frozen.

 That shouldn't prevent you adding private names.  See earlier message
 in this thread.


 Frozen means [[Extensible]] is false, so you can't add any properties, ...

 I'll go further: frozen means the implementation should be free to move the 
 object and anything associated with its shallow-frozen key/value state into 
 read-only memory, so the full force of 50-year-plus OS and hardware MMU 
 protection can ensure [[Extensible]] really is false.

I think this would also preclude the most efficient implementation of
Weak Maps.  In fact I have a hard time seeing how you would GC weak
maps at all under these conditions.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-16 Thread Erik Corry

2011/5/16 Brendan Eich bren...@mozilla.com:
 On May 16, 2011, at 12:10 AM, Erik Corry wrote:

 2011/5/16 Brendan Eich bren...@mozilla.com:
 On May 15, 2011, at 11:53 PM, Erik Corry wrote:

 2011/5/16 Brendan Eich bren...@mozilla.com:
 This is a good point too. Not sure we've considered a value - value map 
 carefully yet.

 A value-anything map is pretty easy to do with a normal JS object.

 function get(value) {
  if (typeof(value) == 'number') return this[NUM + value];
  if (typeof(value) == 'string') return this[STR + value];
  ...
 }

 Where's the 'object' case (excluding null)?

 Elided for clarity :-)  It can be implemented with private names or
 WeakMaps.

 Oh, ok -- you wrote normal JS object and that seemed to preclude new stuff.

 Yes, we can make a value - value map as a library abstraction, but it's 
 clunky. It has to use two kinds of maps under the hood. People shouldn't 
 necessarily have to reinvent or discover or curate it and download those 
 bytes all the time.

This is where we disagree fundamentally.  I think we should give
people the lego bricks to build the abstractions they need.  You seem
to think we should give them all the abstractions  they are going to
need, ready built.

As for downloading all the time that's a caching/modules/libraries
issue which needs to be solved in other ways.  It has to be possible
to write composable modules in the language that are downloaded only
once for each time a new release is created.  If we can't solve that
then the temptation will always be to add every thinkable module to
the language and the result will be constant pressure for bloat.  I
regard the i18n changes currently going in as an example of that.

 My point was we don't need to think about maps with values
 as keys.  We have that already.

 Your use of normal JS object and present tense to refer to either weak maps 
 or private names (implemented via weak maps) is making me want to wait a few 
 months and talk again :-/.

I didn't use 'normal JS object' in that sense.

The context was you talking about a value-value map.

I presented a value-anything map which was implemented as a 'normal object'.

You complained it wasn't an anything-anything map.

I said you could do that with either private properties or weak maps.
At that point you need both the 'normal object' and something else.

Perhaps we have a nomenclature problem with 'value'.  Do you regard
things that have typeof(o) == 'object' as values?

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-16 Thread Erik Corry

2011/5/16 Brendan Eich bren...@mozilla.com:
 On May 16, 2011, at 12:18 AM, Erik Corry wrote:

 2011/5/16 Brendan Eich bren...@mozilla.com:
 On May 16, 2011, at 12:01 AM, Brendan Eich wrote:

 On May 15, 2011, at 11:55 PM, Erik Corry wrote:

 2011/5/16 Brendan Eich bren...@mozilla.com:
 Not if the object is frozen.

 That shouldn't prevent you adding private names.  See earlier message
 in this thread.


 Frozen means [[Extensible]] is false, so you can't add any properties, ...

 I'll go further: frozen means the implementation should be free to move the 
 object and anything associated with its shallow-frozen key/value state into 
 read-only memory, so the full force of 50-year-plus OS and hardware MMU 
 protection can ensure [[Extensible]] really is false.

 I think this would also preclude the most efficient implementation of
 Weak Maps.  In fact I have a hard time seeing how you would GC weak
 maps at all under these conditions.

 Why, because GC metadata has to go in the same page as the frozen object that 
 might be a key in a weak map?

I think the objects used as keys in weak maps need to be somehow
annotated with this information so that the GC can clean up the weak
maps when the keys die.  This means that if you take an object that is
frozen and use it as a key in a weak map then it will need to be
mutated in some way and can't be on a read-only page.

Perhaps you have a different, efficient, implementation.  I can't see
us gaining much from putting frozen objects on read-only pages, thus I
can't accept it as a very strong argument about the way that frozen
objects should work together with a new feature.

 Weak maps are in Firefox nightlies. We're playing with page protection too 
 (not for freezing, yet). This seems like a dare, but it also seems to be 
 dodging my point in replying again: that private names cannot be used to 
 extend frozen objects in the [[Extensible]] = true sense of the spec.

Is there a description anywhere about how you have implemented GC of weak maps?


 /be


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-16 Thread Erik Corry

2011/5/16 Brendan Eich bren...@mozilla.com:
 On May 16, 2011, at 12:11 AM, Erik Corry wrote:

 2011/5/15 Brendan Eich bren...@mozilla.com:
 Besides attaching metadata, weak maps are important for remembering the
 wrapper or membrane for a given (frozen or not, built-in or host, not to
 be mutated) object identity. Mark and Andreas knows too well, so I'm
 preaching to es-discuss in the To: line. This is not a use-case for weak
 references.

 Is there an extra 'not' in this sentence?  weak maps are important
 [...] this is not a use-case for weak references

 Weak references != weak maps.

Thanks, now I see what you meant.  Don't know how I missed that.

 Mark has mentioned membranes as an example of the use of WeakMaps.  I
 can see that you don't want the membrane to keep the objects alive,
 but is it a problem that the objects keep the membrane alive?  Are we
 expecting lots of membranes to come and go and the GC will need to
 clean up after them?  I'm not saying it isn't important, I'm just
 trying to clarify the use case here.

 I'm not sure what this has to do with weak references not being usable -- at 
 all -- for membranes associated with (possibly frozen, 
 not-to-be-mutated-in-any-event) objects.

Nothing, that was just me reading 'weak references' as 'weak maps'.

 There is no there in which to store a weak reference to the membrane from 
 the object. Pigeon-hole problem, frozen object vs. mutation problem, 
 host-object with crazy/zero storage semantics problem, the list goes on.

Perhaps I was unclear:  The semantics of Weak Maps are essentially
symmetric.  If you have the map and the key you can get the value.  If
you have just one then you can't (and the system may GC the value).
My question was, do the use cases have both the GC of the map and the
key triggering the GC of the value or is the GC of the key the
important one and GC of the map not that common/important etc.

For the use case mentioned by Boris in this thread, where a FF
extension needs to attach metadata to an object it doesn't seem likely
that the mapping will get lost and need to be GCed before the objects
that have the metadata attached.

 But in case it helps: yes, membranes need to be GC'ed ahead of their wrapped 
 objects.

 /be


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-16 Thread Erik Corry

2011/5/16 Andreas Gal g...@mozilla.com:

 Even if you want to store weak-map specific meta data per object, nobody 
 would store that directly in the object. Thats needlessly cruel on the cache 
 since 99.9% of objects never end up in a weak map. Instead one would locate 
 that meta data outside the object in a direct mapped dense area (like mark 
 bitmaps), which is on its own page that is not write protected.

More than 99.9% of objects don't have a property called fish.
Nevertheless if someone adds a fish property to an object V8 will
try to (and usually succeed in) storing it in the object and it won't
be cruel on the cache.  Quite the opposite.


 Andreas

 On May 16, 2011, at 10:39 AM, Brendan Eich wrote:

 On May 16, 2011, at 12:47 AM, Erik Corry wrote:

 I think the objects used as keys in weak maps need to be somehow
 annotated with this information so that the GC can clean up the weak
 maps when the keys die.  This means that if you take an object that is
 frozen and use it as a key in a weak map then it will need to be
 mutated in some way and can't be on a read-only page.

 That's already false in Firefox nightlies. We support Object.freeze. We have 
 a WeakMap implementation. We do not mutate the frozen object. Its GC 
 metadata does not reside in a header for it, or even in the same OS page.


 Perhaps you have a different, efficient, implementation.  I can't see
 us gaining much from putting frozen objects on read-only pages, thus I
 can't accept it as a very strong argument about the way that frozen
 objects should work together with a new feature.

 This is a bit too subjective an argument, sorry.

 My point about 50+ years of OS and MMU firewalling is important. Chrome 
 (recently hacked by French spook-types, but also hacked over a year ago with 
 a two-step attack) is a convincing example.

 Sure, we have user-code isolation tools in our belts, including fancy 
 compiler/runtime pairs. But it's hard to beat processes if you want to be 
 sure. No silver bullet, simply stronger isolation.


 Weak maps are in Firefox nightlies. We're playing with page protection too 
 (not for freezing, yet). This seems like a dare, but it also seems to be 
 dodging my point in replying again: that private names cannot be used to 
 extend frozen objects in the [[Extensible]] = true sense of the spec.

 Is there a description anywhere about how you have implemented GC of weak 
 maps?

 http://hg.mozilla.org/tracemonkey/rev/7dcd0d16cc08

 Look for WeakMap::mark... names. There's no need to mutate a key object. 
 There should not be, either.

 Yes, this GC can iterate. A lot, but a fix doesn't obviously require 
 mutating (possibly frozen) key objects. Also, since POITROAE we are going to 
 measure twice, Optimize once.

 /be


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-16 Thread Erik Corry

On May 16, 2011 7:30 PM, Brendan Eich bren...@mozilla.com wrote:

 On May 16, 2011, at 12:43 AM, Erik Corry wrote:

 Perhaps we have a nomenclature problem with 'value'.  Do you regard
 things that have typeof(o) == 'object' as values?


 Certainly.

This is the source of our confusion.  I was meaning 'primitive value' when I
wrote 'value'.  Sorry about that.


 --

Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-15 Thread Erik Corry

On May 15, 2011 2:55 AM, Boris Zbarsky bzbar...@mit.edu wrote:

 On 5/14/11 6:37 PM, Oliver Hunt wrote:

 Can you provide a use case where you have an object key as the usual
programming idiom?


 Attaching metadata to object without polluting the objects themselves.

That sounds more like private names


 For example, Firefox extensions want to do this all the time for various
DOM objects (and especially Window).

 -Boris

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Use cases for WeakMap

2011-05-15 Thread Erik Corry

On May 15, 2011 8:59 AM, Mark S. Miller erig...@google.com wrote:



 On Sat, May 14, 2011 at 11:57 PM, Erik Corry erik.co...@gmail.com wrote:


 On May 15, 2011 2:55 AM, Boris Zbarsky bzbar...@mit.edu wrote:
 
  On 5/14/11 6:37 PM, Oliver Hunt wrote:
 
  Can you provide a use case where you have an object key as the usual
programming idiom?
 
 
  Attaching metadata to object without polluting the objects themselves.

 That sounds more like private names

 Private name and soft field lookup inherits. WeakMap lookup doesn't.

In rhe case where you care about this you can use hasOwnProperty.

And I think you should be able to add private properties to frozen objects.
Since the properties are private they are invisible to whoever froze the
object so they can't violate the expectations of the freezer.  Perhaps it
also needs to be possible to freeze individual private properties (or their
absence) but being able to put a blanket ban on other parts of the program
performing an action that is undetectable to you feels wrong.

In terms of implementation this is no worse than keying a weak map with a
frozen object since that will also involve modifying the frozen object.




 
  For example, Firefox extensions want to do this all the time for
various DOM objects (and especially Window).
 
  -Boris
 
  ___
  es-discuss mailing list
  es-discuss@mozilla.org
  https://mail.mozilla.org/listinfo/es-discuss


 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss




 --
 Cheers,
 --MarkM
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: iteration order for Object

2011-03-11 Thread Erik Corry

2011/3/11 Charles Kendrick char...@isomorphic.com:
 All browsers that have ever had non-negligible market share have implemented
 order-preserving
 Objects - until Chrome 6.

Just to be clear:  Chrome 5 and before had a for-in ordering that
revealed internal optimization strategies.  From Chrome 6 and forward
the behaviour was made consistent.  There was never a version of
Chrome that consistently iterated numeric-keyed properties in
insertion order.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Additional language features

2011-03-07 Thread Erik Corry

2011/3/5 Christian Mayer m...@christianmayer.de:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Hello together!

 Currently I'm writing a (for me) large project that is heavily using
 JavaScript / ECMAScript. During that project I found a few features
 missing in the language that could easily be added and where I think
 that many programmers could profit from:

 1) A printf compatible format string
 

 For example the String object could be extended by a sprintf type of
 function / method that takes a printf compatible format sting and the
 additional values. Currently there are lots of libraries that provide
 that functionality - but all cover only a small subset and it's not know
 which are in good shape or not...

This seems like a library and not a language issue.  Why not just pick
one, add the stuff you need, open source the result?  Unless you are
willing to wait several years for browsers to support your project you
will have to do this anyway.

 2) A binary type conversion including float
 ===

 My project is using AJAX technology to transmit measurement data over
 the net in JSON notation. This data contains IEEE float values, encoding
 their bit and byte representation in hex values.

That seems like a poor decision for an interchange format.  ASCII fp
values work rather well.  What is the space penalty for decimal vs hex
of gzipped data once you have already taken the hit for the JSON
overhead?

 In the browser I need to convert that hex string back into a float value
 - - which is quite complicated as I have to implement an IEEE 754 parser.




 Especially for the use with WebSockets it would be a great help to have
 a functionality that allows to pack and unpack binary data into
 ECMAScript objects.

 A possible syntax (and much better description of what I need) give the
 pack and unpack commands in the Perl language.

 3) A fast library for small, fixed size vectors (numerical arrays)

How fast does this need to be.  Did you already do benchmarks on
modern browsers for the stuff you need.

 ==

 For 2D and 3D purposes it would be great to have a data type / object
 that is specialized for 2D, 3D and 4D values. It might even internally
 map to a SIMD datatype if the CPU running the interpreter is supporting
 it (e.g. a SSE type value for x86 processors; other CPU architectures
 have similar extensions).
 Especially for the Canvas and WebGL it would be great to have such a
 data type.

 For this data type it would be great to have a library supporting it and
 provide linear algebra functionality (matrix multiplication, etc.)
 The C++ library Eigen2 provides everything that is necessary (and even
 more...)

 So I hope I didn't write my ideas to the wrong list (if it is so, please
 correct me!). And I hope you can tell me if those additions could make
 it on the next spec of ECMAScript.

In general I am of the opinion that if the language already offers the
building bricks that you need to do your task then it requires some
special reason why you can't just make a JS level library that fits
your needs.  Basically, you need to have tried it and found that it
doesn't work for you for some fundamental reason.  Speed is not
necessarily an argument, since performance is improving all the time
and improvements to the optimization of JS code is the rising tide
that lifts all boats.  Adding the library you need to the platform has
several disadvantages:

* It makes the platform large and unwieldy.  This worsens download
times, security surface, learning curve.
* It takes years before you can rely on support.
* It locks down the API prematurely, where JS based libraries are free
to evolve.

The role of a language standards body is to say no most of the time.
Anything else leads to a monster of a standard.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: [whatwg] Cryptographically strong random numbers

2011-02-22 Thread Erik Corry

Having got up to speed on this discussion after a holiday here are my
thoughts from a V8er:

* I would prefer to see this in the DOM rather than in the language.
Can't see why it needs to be in the language itself.  Other languages
generally don't have cryptographically secure random numbers in the
language itself, they put them in the libraries.  Other users of the
DOM http://www.w3.org/DOM/Bindings might want to use the feature too.
On the other hand this no big deal.   If  the committee feels that
putting it in the language makes it somehow more righteous then put it
there.

* But let's not invoke node.js as a reason to put it in the language.
Generally their policy is not to put things in the core that could be
libraries.  Seems to me that putting 'var crypto = require(crypto);'
works just fine for node and can give you a completely compatible
interface.

* Lets keep the interface simple.  All considerations of GC or
effciency look completely specious to me given the amount of data
involved (measured in bits).  Returning strings seems strange (crypto
data is numbers.  What about surrogate pairs - do we really want
something in the standard that returns incorrect UTF-16?).  I would be
in favour of any interface that returns regular arrays of numbers or
even simpler just a single number, either 0-1 like Math.random() or 8-
or 32-bit values.  But only one function please.

* I realize that Math.random() is not an ideal interface to emulate.
The reason I suggest it is that putting a new API into the core
language that is completely unlike Math.random() feels very PHP like:
http://www.tnx.nl/php.html  Even though Math.random() and the new
interface are not interchangeable it seems to me that there is value
in consistency in terms of mental effort needed to to memorize the
APIs.

2011/2/18 Oliver Hunt oli...@apple.com:
 I don't think generating 16bit values is beneficial -- either 8bit values for 
 byte at a time processing or full [u]int32 makes more sense.  I think the 
 only reason for 16bit to come up is ecmascript's notionally 16bit characters.

 I would prefer polymorphism of some form, for example

 any gimmeRandom(numberOfRandomElements, constructor = [[]Builtin Array 
 constructor], min = 0, max = 132 - 1) {
    var result = new constructor(numberOfRandomElements);
    for (var i = 0; i  numberOfRandomElements; i++)
          result[i] = reallyRandomValue(min, max);
    return result;
 }

 This would solve all the use cases presented so far (except a string of 
 cryptographically secure values, which can be handled trivially so i've left 
 that as a task for the reader ;) ), and would avoid the shortcomings of the 
 existing array methods (only a finite set of types can be produced as output).

 --Oliver


 On Feb 16, 2011, at 5:37 PM, Brendan Eich wrote:

 On Feb 16, 2011, at 5:33 PM, Allen Wirfs-Brock wrote:

 On Feb 16, 2011, at 4:54 PM, David Herman wrote:

 I say: let's make it typed array in the short term, but TC39 will spec it 
 as an array of uint32 according to the binary data spec. We will try to 
 make the binary data spec as backwards compatible as possible with typed 
 arrays anyway. So in the near term, implementors can use typed arrays, but 
 when they have implementations of the full binary data spec, they can 
 change to use those. It'll probably be a slightly backwards-incompatible 
 change, but by keeping them relatively API-compatible, it shouldn't make 
 too much difference in practice. Plus we can warn people that that change 
 is coming.

 Dave, most browsers other than FF4 internally box all  integers with with 
 32-significant bits.

 I'm not sure this is still true. Certainly on x64, but also on x86, 
 NaN-boxing has taken over in many VMs.


 Some may box with 31 or even 30 significant bits.  So if we spec. the value 
 as a  uint32 and (they are truly random enough) then at least half and 
 possible 75% or more of the values in the array will be boxed in many 
 browsers.  Such boxing will have a much higher cost than immediate uint16 
 values.  That's why I propose 16-bit values.

 Given the implementors on es-discuss, we should survey first. I'd hate to 
 prematurely (de-)optimize.

 I agree with David Wagner that the API has to be dead-simple, and it seems 
 to me having only 16-bit values returned in a JS array may tend to result in 
 more bit-mixing bugs than if we used 32-bit elements, if programmers are 
 carelessly porting code that was written for uint32 arrays.

 /be
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: [whatwg] Cryptographically strong random numbers

2011-02-22 Thread Erik Corry

I can find Klein's complaints that the implementation of Math.random is
insecure but not his complaints about the API.  Do you have a link?

It seems pretty simple to generate a random number from 1 to 2 by fixing the
exponent and mixing in 52 bits of random mantissa. Subtract 1 to get an
evenly distributed value from 0-1. Multiply and Math.floor or  to get
your 8, 16, or 32 bits of randomness.
On Feb 22, 2011 11:04 PM, Brendan Eich bren...@mozilla.org wrote:
 On Feb 22, 2011, at 2:00 PM, Jorge wrote:

 On 22/02/2011, at 22:36, Brendan Eich wrote:
 (...)

 However, Math.random is a source of bugs as Amit Klein has shown, and
these can't all be fixed by using a better non-CS PRNG underneath
Math.random and still decimating to an IEEE double in [0, 1]. The use-cases
Klein explored need both a CS-PRNG and more bits, IIRC. Security experts
should correct amateur-me if I'm mistaken.

 .replace( /1]/gm, '1)' ) ?

 Right.

 Reading more of Amit Klein's papers, the rounding to IEEE double also
seems problematic. Again, I'm not the crypto-droid you are looking for.

 /be

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: [whatwg] Cryptographically strong random numbers

2011-02-22 Thread Erik Corry

Thanks for the link. Having read the section in question I am satisfied that
the author has no problem with the API.
On Feb 23, 2011 12:34 AM, Brendan Eich bren...@mozilla.org wrote:
 On Feb 22, 2011, at 2:49 PM, Erik Corry wrote:
 I can find Klein's complaints that the implementation of Math.random is
insecure but not his complaints about the API. Do you have a link?

 In the paper linked from http://seclists.org/bugtraq/2010/Dec/13 section 3
(3. The non-uniformity bug), viz:

 Due to issues with rounding when converting the 54 bit quantity to a
double precision number (as explained in
http://www.trusteer.com/sites/default/files/Temporary_User_Tracking_in_Major_Browsers.pdfsection
2.1, x2 may not accurately represent the state bits if the whole
double precision number is ≥0.5.

 but that link dangles, and I haven't had time to read more.

 The general concern about the API arises because Adam's API returns a
typed array result that could have lenght  1, i.e., not a random result
that fits in at most 32 (or even 53) bits.

 /be
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Stupid i18n use cases question

2011-01-30 Thread Erik Corry

On Jan 29, 2011 8:37 PM, Mark Davis ☕ m...@macchiato.com wrote:

 There are really 5 cases at issue:
 Code point breaks
 Grapheme-Cluster breaks (with three possible variants: 'legacy', extended,
and aksha)
 Word breaks
 Line breaks
 Sentence breaks
 Notes:
 #1 is pretty trivial to do right in ES.
 The others can be done in ES, but the code is more complicated -- the
biggest issue is that they require a download of a possibly substantial
amount of data. For certain languages, #3 requires considerable code and
data.

The argument that large amounts of data must be downloaded for one language
can't be used to argue that users should be forced to download that data for
all languages in the world.  The alternative, that the browser make use of
data from the OS, is a fragmentation and testability nightmare.

Fonts have similar issues. In that case we are moving to downloadable fonts.
That seems like the right way to go for I18n data too.  Issues of the
cacheability of large font and i18n data are important but not in the scope
of ES.

Moving to a downloadable I18n data architecture also solves the collation
order issues mentioned by Shawn recently where the front end and back end
disagree on collation due to all the issues he mentioned.  All those issues
apply to testing and the homogeneity and testability of the web platform.

 Word-breaks are different than linebreaks; the latter are the points where
you can wrap a line, which may include more than a word or come in the
middle of a word.
 For examples, see http://unicode.org/cldr/utility/breaks.jsp.

 I don't know about the specific use cases that Jungshik had in mind, but
if you are doing client-side word-processing in ES (which various software
does, including ours), then you want all of these, except perhaps #5. For
example, a double-click uses #3.

 There are other use cases for #4 besides word processing; for example,
break up long SMS's, we break at line-boundaries. I'm not saying that
someone has to do this in ES; just giving an example outside of the
word-processing domain.

 Mark

 — Il meglio è l’inimico del bene —


 On Sat, Jan 29, 2011 at 10:25, Shawn Steele shawn.ste...@microsoft.com
wrote:

 On the phone yesterday we mentioned word/line breaking and grapheme
clusters.   It didn't occur to me to ask about the use cases.



 Why does someone need word/line breaking in js?  It seems like that would
better be done by my rendering engine, like the HTML layout engine or my
edit control or something?



 -Shawn



  

 http://blogs.msdn.com/shawnste




 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss



 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: RE: Stupid i18n use cases question

2011-01-30 Thread Erik Corry

If downloading the data is insufficient then download the code too (in js).

Responsibility for correctness lies with the webpage authors

The alternative is a new way to make windows-only webpages or even pages
that only work in one version of windows. It's exactly the same issues as
fonts which also contain code and data. If they can be made downloadable
then so can locale data.

Windows and the others don't even agree on the names of the locales. And it
goes downhill from there.
On Jan 30, 2011 7:32 PM, Shawn Steele shawn.ste...@microsoft.com wrote:
 Downloading the data is insufficient for collation; you'd also have to
ensure that the code processing the data is v1.0 or 1.1 or X.X. And that
there weren't any errors or discrepencies between implementations. I think
you'd quickly discover that isn't possible to guarantee. Even if everyone
agreed to use ICU and the UCA there'd be lots of differences. Also: who's
going to collect ( provide) the data to be downloaded? What's the fallback
when the data isn't available?



 I'm still trying to grok word processing in JavaScript (beyond the
simple case), however for sorting I think it's way better to provide an
architecture that works with an understanding that collation can't be
consistent between machines, at least for the foreseeable future.


 -Shawn

  
 http://blogs.msdn.com/shawnste

 
 From: es-discuss-boun...@mozilla.org [es-discuss-boun...@mozilla.org] on
behalf of Erik Corry [erik.co...@gmail.com]
 Sent: Sunday, January 30, 2011 12:32 AM
 To: Mark Davis ☕
 Cc: Mads Ager; Shawn Steele; es-discuss@mozilla.org
 Subject: Re: Stupid i18n use cases question


 On Jan 29, 2011 8:37 PM, Mark Davis ☕ m...@macchiato.commailto:
m...@macchiato.com wrote:

 There are really 5 cases at issue:
 Code point breaks
 Grapheme-Cluster breaks (with three possible variants: 'legacy',
extended, and aksha)
 Word breaks
 Line breaks
 Sentence breaks
 Notes:
 #1 is pretty trivial to do right in ES.
 The others can be done in ES, but the code is more complicated -- the
biggest issue is that they require a download of a possibly substantial
amount of data. For certain languages, #3 requires considerable code and
data.

 The argument that large amounts of data must be downloaded for one
language can't be used to argue that users should be forced to download that
data for all languages in the world. The alternative, that the browser make
use of data from the OS, is a fragmentation and testability nightmare.

 Fonts have similar issues. In that case we are moving to downloadable
fonts. That seems like the right way to go for I18n data too. Issues of the
cacheability of large font and i18n data are important but not in the scope
of ES.

 Moving to a downloadable I18n data architecture also solves the collation
order issues mentioned by Shawn recently where the front end and back end
disagree on collation due to all the issues he mentioned. All those issues
apply to testing and the homogeneity and testability of the web platform.

 Word-breaks are different than linebreaks; the latter are the points
where you can wrap a line, which may include more than a word or come in the
middle of a word.
 For examples, see http://unicode.org/cldr/utility/breaks.jsp.

 I don't know about the specific use cases that Jungshik had in mind, but
if you are doing client-side word-processing in ES (which various software
does, including ours), then you want all of these, except perhaps #5. For
example, a double-click uses #3.

 There are other use cases for #4 besides word processing; for example,
break up long SMS's, we break at line-boundaries. I'm not saying that
someone has to do this in ES; just giving an example outside of the
word-processing domain.

 Mark

 — Il meglio è l’inimico del bene —


 On Sat, Jan 29, 2011 at 10:25, Shawn Steele shawn.ste...@microsoft.com
mailto:shawn.ste...@microsoft.com wrote:

 On the phone yesterday we mentioned word/line breaking and grapheme
clusters. It didn't occur to me to ask about the use cases.



 Why does someone need word/line breaking in js? It seems like that would
better be done by my rendering engine, like the HTML layout engine or my
edit control or something?



 -Shawn



  

 http://blogs.msdn.com/shawnste




 ___
 es-discuss mailing list
 es-discuss@mozilla.orgmailto:es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss



 ___
 es-discuss mailing list
 es-discuss@mozilla.orgmailto:es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: lookbehind regex

2011-01-19 Thread Erik Corry

http://www.google.com/search?q=lookbehind+es-discuss

2011/1/19 Corey Hart co...@codenothing.com:
 Just curious if there has been any discussion on supporting lookbehind in
 regular expressions. Seems strange that there is no support since lookaheads
 are
 supported(https://mail.mozilla.org/pipermail/es-discuss/2009-February/008719.html).
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Operator Overloading

2011-01-19 Thread Erik Corry

2011/1/19 Cormac Flanagan cor...@cs.ucsc.edu:
 On Mon, Jan 10, 2011 at 2:12 AM, Erik Corry erik.co...@gmail.com wrote:
 On the concrete proposal I note that the strawman claims this can be
 used to implement bignums, but it seems to me there is no way to
 implement storage of arbitrary size so that would appear to be
 impossible.  Am I missing something.

 The operator handler could, for example, contain an array of JS ints
 to represent a bignum.

Can different bignums have arrays of different sizes?

 Another question:  As I read the proposal, the dispatch is on the left
 hand side of binary operators.  Does the proposal have a way for a+b
 to work where a is an old-fashioned number and b is a complex?

 Yes. In this case where a is an old-fashioned number and b is a complex,
 the radd trap of b is invoked, passing argument a.  I've updated the strawman
 to clarify.

How about if the left hand side is a complex number and the right hand
side is a decimal fp number?

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: RegExp deviation from Perl 5 - By Design or Spec Bug?

2011-01-19 Thread Erik Corry

2011/1/19 Dave Fugate dfug...@microsoft.com:
 I’ve come across the following RegExp scenario:

     /\s^/m.test(\n)



 which results in true, yet the equivalent in Perl 5:

 \n =~ m/\s^/m



 yields false.



 It seems that the JavaScript implementations I tested against are correctly
 implementing ES5.  That is, 15.10.2.6, step 4 states that if the character
 before the current endIndex is a newline then return true. After matching
 the \s, the previous character is indeed a newline, so the assertion
 succeeds.



 Am I misinterpreting ES5 here and the implementations are exhibiting
 incorrect behavior *or* is the spec not matching Perl 5?  If it’s the
 latter, would this be considered a spec defect as ES5’s RegExp feature is
 roughly a subset of Perl 5?

There are quite a few places where ES differs from perl.

Why is it that perl does not match here?

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Operator Overloading

2011-01-10 Thread Erik Corry

2011/1/10 thaddee yann tyl thaddee@gmail.com:
 I see no reason to name the floordiv trap that way. Python has a

Agreed.

On a slightly more high level note it seems like there is a very large
number of complex proposals being poured into Harmony.  If they are
all implemented the language will become unwieldy and complex both for
users and implementers.  Is there a sense in which we have a
'complexity budget' or does the committee feel we can add all
proposals to the language?

On the concrete proposal I note that the strawman claims this can be
used to implement bignums, but it seems to me there is no way to
implement storage of arbitrary size so that would appear to be
impossible.  Am I missing something.

Another question:  As I read the proposal, the dispatch is on the left
hand side of binary operators.  Does the proposal have a way for a+b
to work where a is an old-fashioned number and b is a complex?

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Re: New private names proposal

2010-12-16 Thread Erik Corry

On Dec 17, 2010 2:14 AM, Douglas Crockford doug...@crockford.com wrote:

 On 11:59 AM, Brendan Eich wrote:

 Really, it's starting to feel like Survivor or American Idol around
here. The apples to oranges death-matching has to stop.


 I don't mind a good deathmatch as long as it ends in death.

 We will soon be at the point where we need to start culling the strawmen
so we can focus on the stuff that will eventually go to standard. So we will
have to reach consensus on the stuff that goes forward, essentially voting
the other strawmen off the island.

Hear hear


 But I agree about the apples and oranges part. The arguments all around
need to be better targeted.

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Suggested RegExp Improvements

2010-11-16 Thread Erik Corry

2010/11/16 Mike Samuel mikesam...@gmail.com:
 2010/11/16 Erik Corry erik.co...@gmail.com:
 2010/11/15 Marc Harter wav...@gmail.com:
 On Mon, 2010-11-15 at 14:06 +0100, Erik Corry wrote:

 Your proposal seems to allow variable length lookbehind.  This isn't
 allowed in perl as far as I know.  I just tried the following:

  perl -e 'foobarbaz =~ /a(?=(ob|bab))/;'

 which gives an error on perl5.  I think if we are going to allow
 variable length lookbehind we should first find out why they don't
 have it in perl.  I think the implementation is a little tricky if you
 want to support the full regexp language in lookbehinds.

 This was not my intention.  I am proposing zero-width lookbehind, which
 would not allow for the case you specified above.  I will update the

 The issue is not with the number of characters consumed by the
 assertion.  This is indeed zero.  The issue is with the width of the
 text matched by the disjunction inside the brackets.  This is not any
 disjunction, but rather a restricted part of the regexp language that
 can only match a particular number of characters.

 It seems the .Net regexp library is able to handle arbitrary content
 in a lookbehind.  It is almost the only one.

 See http://www.regular-expressions.info/lookaround.html#lookbehind for
 more details.

 We could add this feature to JS.  As far as I can work out it
 presupposes the ability to reverse an arbitrary regexp and run it
 backwards (stepping back and backtracking forwards).  I don't think we
 should add it accidentally though, and perhaps the proposer should be
 the first to implement it.

 Don't you already have to do that to efficiently handle a regexp that
 ends at the end of the input (in JS, a non multiline $, or \z in
 java.util.regex parlance)?

V8 doesn't have a general form of that optimization.  Do the others?

 If you have the whole input string available in memory, and are trying
 to figure out whether a lookbehind (?=x) matches at position p, can't
 you just test /(?:x)$/ against the prefix of the input of length p.


 proposal.  It is my understanding that lookahead as implemented in
 ECMAScript also is zero-width and not variable.  This is also how Perl has
 implemented lookbehind.

 http://perldoc.perl.org/perlre.html#Extended-Patterns

 Updated Proposal:
 https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM

 The issue is not that the regexp doesn't match in perl. The issue is
 that it is not compiled at all.


 Is there an example of a language that supports the full regexp power
 in lookbehinds so we can look at their experiences with implementing
 it?

 As far as I know Perl is the de facto standard.



 2010/11/15 Marc Harter wav...@gmail.com:
 Brendan et al.,

 I have created a proposal for look-behind provided at this link:


 https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM

 I hope it is a format that will be helpful for discussion with TC39.
 Admittedly, I have never written one of these before so am completely open
 to any feedback or ways to improve the document from yourself or anyone
 else
 on this list.

 Marc

 On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote:

 I would be game to write up a proposal for this.  When would you need
 this by to discuss w/ TC39?

 Thanks for your consideration,
 Marc

 On Nov 12, 2010, at 5:04 PM, Brendan Eich bren...@mozilla.com wrote:

 On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:

 After considering all the breadth this discussion could take maybe it
 would be wise to just focus on one issue at a time.  For me, the biggest
 missing feature is lookbehind.  Its common to most languages
 implementing the Perl-RegExp-syntax, it is very useful when looking for
 patterns that follow or don't follow a particular pattern.  I guess I'm
 confused why lookahead made it in but not lookbehind.

 This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but
 we
 proposed to ECMA TC39 TG1 (the JS group -- things were different then,
 including capitalization) something based on Perl 5. We didn't get
 everything, and we had to rationalize some obvious quirks.

 I don't remember lookbehind (which emerged in Perl 5.005 in July '98)
 being left out on purpose. Waldemar may recall more, I'd handed him the
 JS
 keys inside netscape.com to go do mozilla.org.

 If you are game to write a proposal or mini-spec (in the style of ES5
 even), let me know. I'll chat with other TC39'ers next week about this.

 /be


 What do people
 think about including this feature?

 Marc

 On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
 I will start out with a disclaimer.  I have not read both ECMAScript
 specifications for 3 and now 5, so I admit that I am not an expert in
 the spec itself but as I user of JavaScript, I would like to get some
 expert discussion over this topic as proposed enhancements to the
 RegExp engine for Harmony.

 I will start with a list of lacking features in JS as compared

Re: Suggested RegExp Improvements

2010-11-15 Thread Erik Corry

Your proposal seems to allow variable length lookbehind.  This isn't
allowed in perl as far as I know.  I just tried the following:

 perl -e 'foobarbaz =~ /a(?=(ob|bab))/;'

which gives an error on perl5.  I think if we are going to allow
variable length lookbehind we should first find out why they don't
have it in perl.  I think the implementation is a little tricky if you
want to support the full regexp language in lookbehinds.

Is there an example of a language that supports the full regexp power
in lookbehinds so we can look at their experiences with implementing
it?




2010/11/15 Marc Harter wav...@gmail.com:
 Brendan et al.,

 I have created a proposal for look-behind provided at this link:

 https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM

 I hope it is a format that will be helpful for discussion with TC39.
 Admittedly, I have never written one of these before so am completely open
 to any feedback or ways to improve the document from yourself or anyone else
 on this list.

 Marc

 On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote:

 I would be game to write up a proposal for this.  When would you need
 this by to discuss w/ TC39?

 Thanks for your consideration,
 Marc

 On Nov 12, 2010, at 5:04 PM, Brendan Eich bren...@mozilla.com wrote:

 On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:

 After considering all the breadth this discussion could take maybe it
 would be wise to just focus on one issue at a time.  For me, the biggest
 missing feature is lookbehind.  Its common to most languages
 implementing the Perl-RegExp-syntax, it is very useful when looking for
 patterns that follow or don't follow a particular pattern.  I guess I'm
 confused why lookahead made it in but not lookbehind.

 This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but we
 proposed to ECMA TC39 TG1 (the JS group -- things were different then,
 including capitalization) something based on Perl 5. We didn't get
 everything, and we had to rationalize some obvious quirks.

 I don't remember lookbehind (which emerged in Perl 5.005 in July '98)
 being left out on purpose. Waldemar may recall more, I'd handed him the JS
 keys inside netscape.com to go do mozilla.org.

 If you are game to write a proposal or mini-spec (in the style of ES5
 even), let me know. I'll chat with other TC39'ers next week about this.

 /be


 What do people
 think about including this feature?

 Marc

 On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
 I will start out with a disclaimer.  I have not read both ECMAScript
 specifications for 3 and now 5, so I admit that I am not an expert in
 the spec itself but as I user of JavaScript, I would like to get some
 expert discussion over this topic as proposed enhancements to the
 RegExp engine for Harmony.

 I will start with a list of lacking features in JS as compared to Perl
 provided by (http://www.regular-expressions.info/javascript.html):

 * No \A or \Z anchors to match the start or end of the string.
   Use a caret or dollar instead.
 * Lookbehind is not supported at all. Lookahead is fully
   supported.
 * No atomic grouping or possessive quantifiers
 * No Unicode support, except for matching single characters with
   \u
 * No named capturing groups. Use numbered capturing groups
   instead.
 * No mode modifiers to set matching options within the regular
   expression.
 * No conditionals.
 * No regular expression comments. Describe your regular
   expression with JavaScript // comments instead, outside the
   regular expression string.

 I don't know if all of these need to be in the language but there
 have been some that I have personally wanted to use:

 * Lookbehind!  ECMAScript fully supports lookahead, why not
   lookbehind?  Seems like a big hole to me.
 * Named capturing groups and comments (e.g.
   http://xregexp.com/syntax/).  Mostly I argue for this because
   it makes RegExp matches more self-documenting.  Regular
   Expressions are already cryptic as it is.

 I do like some of the new flags proposed in
 (http://xregexp.com/flags/) but personally haven't used them but maybe
 that is something also for discussion.

 Marc Harter

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Negative indices for arrays

2010-11-12 Thread Erik Corry

2010/11/12 Dmitry A. Soshnikov dmitry.soshni...@gmail.com:

 What do you suggest?

I suggest you monkey patch a get method on the Array prototype and
forget trying to get the language semantics changed.  The people who
implement the language have made their opinions clear and as of this
post that includes V8.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Concerns about weak refs and weak maps.

2010-10-29 Thread Erik Corry

I share Rick's worry about weak maps.  It's not clear how to make an
efficient GC in the presence of weak maps.

As Brendan rightly points out there is already a lot of complexity in
the GC of a typical JS implementation because it is interacting with
the (typically) reference counted, C++ based DOM.  Adding complexity
in the form of a new weak reference model looks like it could limit JS
implementations severely down the road.

One of the hallmarks of a real language implementation vs. a 'toy
scripting language' ala PHP is a good GC.  I'd really like to see
someone do a low-latency GC with weak maps before we hobble the
language with something that can't be implemented efficiently.  By
modern I mean generational and either parallel or concurrent.  (In a
parallel GC there are several threads doing GC in parallel while the
actual JS program is stopped.  In a concurrent GC there is one or more
threads doing GC while the JS program is still making progress).

It is true that the language is not multithreaded and never will be.
This does not limit the implementation from making use of several
threads/CPUs when doing garbage collection.  We actually have a V8
prototype that has a parallel GC
http://codereview.chromium.org/3615009/show though probably it won't
be landed in its current form.  We will likely be investigating a
slightly different approach.  But in principle this is quite possible:
 a single threaded JS implementation where part of the runtime is
multithreaded.

Just for clarity, since Brendan brought up the limitations of the V8
GC:  At the moment the V8 GC is accurate (not conservative), moving
(resistent to fragmentation) and generational (most pauses are small).
 But there is plenty of room for improvement in terms of very large
heaps, worst-case pause times and making use of multiple CPUs.

Let's be careful not to add things to the language that will limit
implementations from getting the sorts of modern GC implementations
that Java and .NET have.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Concerns about weak refs and weak maps.

2010-10-29 Thread Erik Corry

2010/10/29 Brendan Eich bren...@mozilla.com:
 On Oct 29, 2010, at 12:08 AM, Erik Corry wrote:

 One of the hallmarks of a real language implementation vs. a 'toy
 scripting language' ala PHP is a good GC.  I'd really like to see
 someone do a low-latency GC with weak maps before we hobble the
 language with something that can't be implemented efficiently.

 Any chance you guys will implement the WeakMap proposal?

I can't see Google's V8 team doing that, but anyone can take V8 and do
experiments on their branch of it.

 By modern I mean generational and either parallel or concurrent.

 Parallel vs. concurrent GC is a good distinction to bring up. We're 
 interested in parallel GC too, and we have a WeakMap prototype under way, so 
 we'll have to report back in due course.

 Still, it seems premature to rule WeakMaps out right now. Rather than 
 strangle them in the cradle, I hope several browser-based engines can 
 implement and see what we find out. The use-cases in the language remain 
 pretty compelling, without a good fallback strategy (leaky strong maps? No 
 object reference keyed maps at all?).

 Concurrent is a different animal from parallel, and I remain skeptical about 
 JS implementations  deviating significantly from shared-nothing concurrency, 
 even as the JS embeddings use threads under the hood to utilize more cores.

 /be


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Extensions in ES5 strict mode (was: No more modes?)

2010-10-14 Thread Erik Corry

My thoughts for what they are worth:

The semantics for const in Harmony are likely to be silently different
in Harmony from the semantics it has in non-strict current
implementations.  (In particular the current const is hoisted to the
surrounding function, whereas the one in Harmony won't, so the
shadowing will be different.)

Given this silent behavioural change it would be advantageous to flush
out existing uses of const before Harmony arrives and gives it a new
meaning.

Forbidding const in strict mode would seem to be a way to do that.

-- 
Erik Corry, Software Engineer
Google Denmark ApS - Frederiksborggade 20B, 1 sal,
1360 København K - Denmark - CVR nr. 28 86 69 84
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: WeakMap API questions?

2010-09-03 Thread Erik Corry

2010/9/2 Mike Shaver mike.sha...@gmail.com:
 On Thu, Sep 2, 2010 at 11:32 AM, Erik Corry erik.co...@gmail.com wrote:
 Surely that is the case with WeakMap?  At least unless you lost the
 key and don't have any other references to the value.  In which case
 you can't reach the value any more, so why would you care whether it
 is kept alive?

 You're right; I forgot about the fact that the keys were not
 necessarily value types.  Sorry for the noise.

I wonder if this points to potential confusion stemming from the
WeakMap name.  It seems obvious (and wrong) that a WeakMap would act a
little like a HashMap where the values were WeakReferences.

Perhaps ObjectMap would be better?

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: WeakMap API questions?

2010-09-02 Thread Erik Corry

2010/8/14 Mark S. Miller erig...@google.com:
 On Sat, Aug 14, 2010 at 1:01 PM, Ash Berlin ash...@firemirror.com wrote:

 On 14 Aug 2010, at 07:22, Erik Arvidsson wrote:
  I have a few questions regarding the WeakMap API.
 
  1. Why isn't there a way to check for presence of a key (using
  has/contains)?
 
  Given that undefined is a valid value it is not sufficient to just
  return undefined for get

 Does the standard trick of:

  if (key in weakMapInstance) { }

 not work?

 It does not. A key is not a property name. A weak map is an object with two
 own properties, names get and set, whose values are the methods that
 constitute the weak map API.


 
  2. Why isn't there a way to remove a key-value-pair?
 
  Setting the value to undefined is not the same.

 Again:

  delete weakMapInstance[key];

 No. This syntax deletes named properties.


  3. Why isn't there a way to iterate over the keys?
 
  I can see that this might be a security issue but iteration is useful
  and security sensitive code can prevent iteration in several ways.

  Object.keys(weakMapInstance)

 No. Object.keys enumerates property names.

And this is as it should be.  As it stands the weak map can be used as
an object with private members.  The object key acts as a capability
that controls whether or not you have access to the private member.
If you are allowed to enumerate the keys then privacy goes out of the
window.



 
  4. Why does set throw if the key is not an object but get doesn't?
 
  Same would go for delete and has if those are added.



 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss



 --
     Cheers,
     --MarkM

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: es-discuss Digest, Vol 43, Issue 1

2010-09-02 Thread Erik Corry

2010/9/2 Leo Meyerovich lmeye...@gmail.com:

 On Sep 2, 2010, at 12:29 AM, Brendan Eich wrote:

 On Sep 2, 2010, at 12:08 AM, Leo Meyerovich wrote:

 That said, going back to the beginning: deterministic GC-independent 
 semantics are a Good Thing. Whether this matters seems to be a crucial 
 discussion. Is there a concern for basic correctness for more mundane code? 
 Are those use cases similar to the above where a 'strictness' library 
 suffices? Again, I'd point to the ActionScript experience for a reference: 
 can anybody attack or break a flash program with weak maps, or is this FUD? 
 I'm more concerned about the dangers of misuse in terms of memory leaks and 
 event-oriented logic glitches between code running on different ES 
 implementations (and suspect the answer has a reasonable chance of being 
 'yes' for these).

 I tend to agree that the answer is 'yes', assuming the question was Whether 
 [deterministic GC-independent semantics] matters ;-).



 Leaving how in question -- e.g., achievable with a simple library toggle -- 
 and whether it *always* matters (such as for those would want the enumeration 
 ability and could arguably contain the non-determinism).


 It's hard to make an airtight case for security even on a good day, but 
 removing observability from the weak maps proposal makes it strictly simpler 
 and less likely to cause any number of problems. I don't believe that lack 
 of enumerability and .length cripples the API for the intended use cases of 
 soft fields, membrane caches, and the like.

 /be


 There are use cases for enumeration like proper native GC in data flow 
 abstractions and it seems others that have piped up with their own uses. 
 Obviously, different people on this list have presented different use cases 
 for weak maps and it seems those two exact scenarios it. I'd be curious about 
 further ones for enumeration  (which I suspect surface in proper 
 implementations of various event-oriented abstractions) and weak maps in 
 general -- it may be we're the only ones who care and only those two 
 scenarios are important to the committee members ;-) For example, the ability 
 to trigger a purge might matter for reliability in some uses (which I'd 
 suspect for certain transient resource pools), even if it's a no-op wrt JS 
 semantics.

I would strongly oppose any way to trigger a full garbage collection
from JavaScript.  Experience from the Java world shows that it is
inevitably abused with very serious performance consequences.

Note the mealy description here with words like suggests and best
effort 
http://download.oracle.com/javase/1.5.0/docs/api/java/lang/System.html#gc()

There is even a command line option on some VMs to disable this call
and turn it into a noop.

So please no.


 - Leo

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: WeakMap API questions?

2010-09-02 Thread Erik Corry

2010/9/2 Mike Shaver mike.sha...@gmail.com:
 On Thu, Sep 2, 2010 at 12:46 AM, Erik Corry erik.co...@gmail.com wrote:
 And this is as it should be.  As it stands the weak map can be used as
 an object with private members.  The object key acts as a capability
 that controls whether or not you have access to the private member.

 If I were to be using an object with private members, I would
 certainly expect that those members would keep their values alive.

Surely that is the case with WeakMap?  At least unless you lost the
key and don't have any other references to the value.  In which case
you can't reach the value any more, so why would you care whether it
is kept alive?

 Wouldn't it be better to just use a regular object, and add private
 members via defineProperty to make them non-enumerable?

That is certainly a fine way to do private members if they only need
to be private by convention and there are no security issues around
them being private.

 I'm not in favour of WeakMap enumerability, really, but it seems that
 there's an easier way to address this particular use case.

 Mike

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: EcmaScript i18n API proposal

2010-06-09 Thread Erik Corry

On the face of it this proposal introduces a huge new area of
incompatibilities between engines in terms of both which locales are
supported and the details of the locales.  The example (German
phonebook locale) is suitably obscure as to illustrate the
hopelessness of expecting JS engines to contain all thinkable locales.

From the proposal The biggest problem is the size and complexity of
data needed for collation process.  Given that this data is so huge,
requiring all JS engines to include it for all locales ever invented
doesn't sound good.

How about a system where locales can be described in a JSON based
format and loaded into the running JS implementation?

A positive thing about the proposal:  It doesn't, if I have understood
it correctly, have a concept of a context-global locale setting.   No
global state?  I like this.

2010/6/9 Nebojša Ćirić c...@google.com:
 We would like to propose adding i18n API to the EcmaScript standard (either
 as standard library or part of the language).
 Our current proposal is at EcmaScript i18n API (open to edits). We will
 migrate the document to the proper strawman wiki page as soon as we get
 access to it.

 We feel that our current proposal represents the minimum set of objects and
 methods needed, but we could certainly extend it to cover number/currency
 formatting/parsing and possibly calendar support. Our main goal was to start
 with minimal proposal and get early feedback from the community.
 Please leave feedback to the proposed API either inside of the document or
 post back to the mailing list.
 --
 Nebojša Ćirić

 i18n team,
 Google Inc.
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Adoption of the Typed Array Specification

2010-05-18 Thread Erik Corry

2010/5/18 Kenneth Russell k...@google.com:
 On Thu, May 13, 2010 at 8:28 PM, Allen Wirfs-Brock
 allen.wirfs-br...@microsoft.com wrote:
 Vladimir Vukicevic vladi...@mozilla.com said:

However, another consideration is that the WebGL spec isn't ES specific,
 and yet has to depend on typed arrays.  So perhaps we're really talking
 about two different specs: a main typed array spec that uses Web IDL and can
 be implemented generically in any language, as well as a separate spec
 describing ES types that happen to fulfill the requirements of typed arrays.

 If that is a concern, how do you expect these interfaces to work with other
 languages.  In a C++ binding are the view objects and the buffer objects
 still going be distinct objects or are you expect to merge them into native
 C++ objects.   I think that there is a pretty fundamental question here:
 does your (and similar) application need to expose binary buffers that exist
 natively in the  implementation technology of your subsystem and which can
 be interchange among multiple client languages.  Or, are you able and
 willing to directly work with native JavaScript buffer objects (assuming
 that such things exist) even it that a less natural form of access on your
 part.  In the first case, “host objects” may be exactly what you need.  If
 the second is what you would like, then we probably need a EcmaScript
 extension.

 Using hypothetical native JavaScript buffer objects would be
 compatible with our current relatively simple use of TypedArrays.
 However, we have begun to explore more advanced use cases including
 sharing TypedArrays among web workers, and between ECMAScript and
 browser plugins. In these situations, if we were to use native
 JavaScript buffer objects, we would need to specify additional
 behavior for the objects.

This looks like a can of worms to me.  Shared buffers break with the
shared-nothing and message-passing paradigms and necessitate
synchronization primitives.

 In the Java platform, the NIO buffer classes provide similar
 functionality, and provide a bridge to the outside world. The Java
 APIs [1] specify how values can be fetched and stored in the buffers.
 A few entry points in the Java Native Interface [2] specify how
 external C code can wrap a region of memory in a NIO buffer and
 thereby expose it to Java. If it were possible to specify similar
 functionality in an ECMAScript extension, I think it would enable all
 of the use cases mentioned above.

 -Ken

 [1] http://java.sun.com/javase/6/docs/api/java/nio/package-summary.html
 [2] 
 http://java.sun.com/j2se/1.4.2/docs/guide/jni/jni-14.html#NewDirectByteBuffer
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: names [Was: Approach of new Object methods in ES5]

2010-04-19 Thread Erik Corry

2010/4/19 Brendan Eich bren...@mozilla.com:
 On Apr 19, 2010, at 4:27 PM, Peter van der Zee wrote:

 Basically, this means we cannot introduce new language constructs or
 syntax because older implementations will trip over the code with no way to
 recover. Furthermore, for various reasons it seems feature detection is
 favored over version detection.

 When you want the new syntax, though, you're going to have to use opt-in
 versioning (see RFC4329).

Let's not go there.

The names proposal seems to be basically ephemeron tables without the
special GC semantics.

I'm a great fan of coupling proposals.  Putting a dozen uncoupled
proposals into Harmony looks like a recipe for a hodge-podge language.
 Finding powerful abstractions that solve several problems at once (in
this case weak hashes and private variables) feels much nicer.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: names [Was: Approach of new Object methods in ES5]

2010-04-19 Thread Erik Corry

2010/4/20 Brendan Eich bren...@mozilla.com:
 On Apr 19, 2010, at 5:15 PM, Erik Corry wrote:

 2010/4/19 Brendan Eich bren...@mozilla.com:

 On Apr 19, 2010, at 4:27 PM, Peter van der Zee wrote:

 Basically, this means we cannot introduce new language constructs or
 syntax because older implementations will trip over the code with no way
 to
 recover. Furthermore, for various reasons it seems feature detection
 is
 favored over version detection.

 When you want the new syntax, though, you're going to have to use opt-in
 versioning (see RFC4329).

 Let's not go there.

 We have new syntax in Harmony. We are going there.


 The names proposal seems to be basically ephemeron tables without the
 special GC semantics.

 That is over-specification and implementation-as-specification, and it will
 not fly in TC39.


 I'm a great fan of coupling proposals.

 Have you heard of the multiplication principle?

http://www.google.com/search?sourceid=chromeie=UTF-8q=multiplication+principle
?


 I like my odds ratios bigger, thank you very much.

You are welcome.

 I've strongly advised
 Mark on this point and he has adapted his proposals.


 Putting a dozen uncoupled
 proposals into Harmony looks like a recipe for a hodge-podge language.

 Hodge-podge is what you get by implementation-as-specification.


 Finding powerful abstractions that solve several problems at once (in
 this case weak hashes and private variables) feels much nicer.

 A name abstraction that is concrete in terms of GC, object type (typeof),
 and the possibility of non-leaf Name objects is not abstract at all -- it is
 concretely an EphemeronTable.

 TC39 wants sugar for names. But desugaring taken too far becomes
 compilation, which is not simple syntax rewriting. It's also observable via
 typeof, Name objects having properties, and other effects (I'm willing to
 bet).

 Real abstractions serve use-cases, that is, pressing user needs, without
 implementing abstractions in overtly leaky ways. That's what we need for
 Names, and many other proposals. This does not make a hodge-podge if we
 serve the important use-cases. It makes a better language.

 If we can unify abstractions without leaks, sure. That's not the case here.

 /be

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Proposal for exact matching and matching at a position in RegExp

2010-02-12 Thread Erik Corry

2010/2/12 Andy Chu a...@chubot.org:
 On Thu, Feb 11, 2010 at 10:24 PM, Steve L. steves_l...@hotmail.com wrote:
 Outside of es-discuss, Brendan Eich asked for my thoughts on the merits of
 \G vs. /y (intrinsically and in light of backward compatibility). I sent the
 following reply, which he thought would be useful to forward to the list

 I have no preference between /y and \G. When I first saw /y proposed for
 ES4, I felt it needlessly reinvented the wheel given that \G had already
 been implemented pretty widely. On the other hand, the fact that \G reaches
 out of the search pattern to read a property of a regex or string feels a
 bit too much like magic to me, and implementing it as a flag (/y) seems less
 weird. An argument in favor of \G is that it's more versatile than /y since
 it can be used anywhere in a regex pattern (e.g., at the start of an
 alternation option), not just as the leading element.

 Agree that \G breaks some logical barrier.  I like to have a mental
 model of the implementation internals, and \G breaks that a bit.

\G is more flexible and it is rather similar to ^ conceptually.

The mental model happens to be out of sync with how regexps are
implemented.  The implicit .*? at the start of a regexp is actually
the fastest way to implement since you are using the fast internal
search mechanisms that the regexp engine has rather than an external
loop that repeatedly asks does it match here?.

Certainly if the /y variant is adopted then V8 will implement it as if
it were specified with \G.  Ie there would be two different regexps
behind the scenes, one with and one without /y.  This is similar to
what would happen if you could specify /i at match time instead of
compile time.

 If compatibility with Mozilla is not an issue, I actually prefer
 Python's approach of .search() vs. .match().  It's not a part of the
 regex; it's not a property of the regex; it's how you *apply* the
 regex to a string.  Just like you can apply the same regex with
 .split() or .exec() or .replace().  They're orthogonal issues in my
 mind.

 Though as mentioned, gracefully upgrading with ES3-5 is an issue, so I
 could only think of .exec() and .execLeft() for a left-anchored match.

 One thing I didn't bring up is that Python actually has an endpos
 argument.  You do regex.search(s, 10, 20), and it will stop at
 position 20.  I couldn't think of a real use case for this.  But
 anyone can think of one, that might be a consideration and sway things
 in favor of separate methods.

 Andy



 Note that \G works a bit differently across implementations. In some cases
 it matches the start position of the current match (PCRE, Ruby), and
 elsewhere it matches the end position of the previous match (Perl, Java,
 .NET). Of course, this distinction only matters after a zero-length match
 (since that increments the start position of the next search).

 Perl has extra functionality around \G that makes it more useful.
 Specifically, the fact that the location associated with \G is an attribute
 of target strings (pos()) means that multiple regexes with \G can match
 against a string in turn and they'll each pick up where the others left off.
 Combine this with Perl's /c modifier (which prevents failed matches from
 resetting the \G location) and you can run multiple regexes with \G and /c
 against a string and advance only when there's a match. Here's a crappy
 example:

 while ($html !~ /\G$/gc) {
   if ($html =~ /\G[^]+/gc) {
       ...
   } elsif ($html =~ /\G(\w+)[^]+/gc) {
       ...
   } elsif ($html =~ /\G#?\w+;/gc) {
       ...
   }
 }

 Sorry for the tangent, but I thought it might be helpful to describe how \G
 is used elsewhere.

 Steven Levithan
 http://blog.stevenlevithan.com

 --
 From: Steve L. steves_l...@hotmail.com
 Sent: Wednesday, February 10, 2010 10:46 AM
 To: Andy Chu a...@chubot.org; es-discuss es-discuss@mozilla.org
 Subject: Re: Proposal for exact matching and matching at a position in
 RegExp


 http://andychu.net/ecmascript/RegExp-Enhancements-2.html

 Basically the proposal is to add parameters which can override the
 internal state of the RegExp.

 Does anyone have any comments on this?

 Can I put it in a place where it will be considered for the next
 ECMAScript?  The overall idea seems relatively uncontroversial since
 it was already implemented by Mozilla (for the exact same reason).  I
 have proposed a specific API enhancement too.

 I do not believe it was implemented for the exact same reason. It seems
 you are merely looking for a way to match exactly at a given character
 position, and you correctly note that /y is not an elegant solution for
 this problem. However, although /y can be used to solve this problem, my
 understanding is that it was designed to work similarly to the \G regex
 token from Perl/PCRE/Java/.NET/etc. while tying in nicely with the
 lastIndex property. An important feature of /y (and \G from other regex

Re: Proposal for exact matching and matching at a position in RegExp

2010-02-09 Thread Erik Corry

2010/2/9 Andy Chu a...@chubot.org:
 On Wed, Jan 27, 2010 at 10:03 PM, Andy Chu a...@chubot.org wrote:
 (The original message was held up in spam moderation for awhile)

 Here is an addendum, after it was pointed out to me that this issue
 has come up before:

 http://andychu.net/ecmascript/RegExp-Enhancements-2.html

 Basically the proposal is to add parameters which can override the
 internal state of the RegExp.

 Does anyone have any comments on this?

I think it would be nice to have this feature.  For example the
Windscorpion xml test would run must faster with such an option.

However, expanding the language by adding extra parameters to existing
functions is annoying because it means you can't test for the presence
of absence of the feature with a simple if:

if (RegExp.funkyNewFunction) {
  ...
}

I think that using the length property on the function is going to be
unreliable given the existing variation in those values.

The way we do regexps in V8 the 'y' is part of the regexp and we would
have to recompile the regexp to handle with or without 'y'.  In other
words in our implementation we have an implict .*?( at the start of
the regexp that indicates that there is a non-greedy match-anything
before the 0th capture parenthesis.  This is a pretty fast and clean
approach.  I think having something that matched 'the point where
searching started' would be more flexible than the global Y flag.
Then you could use it like '^' as one part of an alternation.   Perl
uses \G for the same concept and I think following that would be fine.
 To test for the presence of the feature you could use

!/\Gx/.test(Gx)

which returns true if the feature is implemented.


 Can I put it in a place where it will be considered for the next
 ECMAScript?  The overall idea seems relatively uncontroversial since
 it was already implemented by Mozilla (for the exact same reason).  I
 have proposed a specific API enhancement too.

 thanks,
 Andy
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: string.toUpperCase

2009-12-23 Thread Erik Corry

Given that Chrome's behaviour feels like the right thing to do and is
allowed by the standard and matches Safari I think we'll probably keep it.

On Dec 24, 2009 7:40 AM, Maciej Stachowiak m...@apple.com wrote:

On Dec 23, 2009, at 10:07 PM, Douglas Crockford wrote:  'die Straße geht
jemals jemals auf'.toUpp...
My results in Safari 4.0.4 on Mac match what you have for Chrome.

Regards,
Maciej

___ es-discuss mailing list
es-discuss@mozilla.org htt...
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Weak references and destructors

2009-12-11 Thread Erik Corry

2009/12/11 Mark S. Miller erig...@google.com:


 On Thu, Dec 10, 2009 at 11:38 AM, Brendan Eich bren...@mozilla.com wrote:

 On Dec 10, 2009, at 11:27 AM, Mark S. Miller wrote:

 By all means, let's continue hashing it out. I posted this proposal to
 es-discuss and presented it to the committee some time ago. I do not recall
 any serious objections, and I do recall several positive responses. However,
 the committee has not yet made any decision. If there were serious
 objections I have forgotten, my apologies, and I ask that you (re?)post them
 to es-discuss.

 Allen had some thoughts, but we were out of time at the last face to face.
 I'll let him speak for himself.
 The issue that I raised at the last meeting, other than naming nits (which
 we can defer for now), was in response to this:
  All visible notification happens via setTimeout() in order to avoid plan
 interference hazards. Side effects from these notifications happen in their
 own event-loop turn, rather than being interleaved with any ongoing
 sequential computation. However, this requires us to promote setTimeout()
 and event-loop concurrency into the ES-Harmony specification, which is still
 controversial.
 from http://wiki.ecmascript.org/doku.php?id=strawman:weak_references.
 We may not standardize the execution model to the degree you hope.

 I do think we should standardize (set|clear)(Timeout|Interval) and
 event-loop concurrency, as server-side JavaScript use is already moving
 towards adopting it, making its need independent of the browser. However, I
 agree that these proposals should be decoupled if possible. Accordingly, I
 have kludged the weak pointer proposal by modifying the definitional WeakPtr
 code at the top and adding the following text to the paragraph you quote
 above:

 In order to postpone the issue, the spec implied by the above code should
 be taken literally: If there is no global binding for setTimeout or if it
 bound to a non-callable value (as the time WeakPtr is called), then no
 notifications happen. If the value of the global setTimeout is callable,
 then the GC calls it at some arbitrary time, passing in a frozen function
 whose only purpose is to call the registered executor function.
 If setTimeout has its normal binding (e.g., in the browser), then the
 executor will only be called later in a separate turn as expected,
 protecting us from plan interference hazards. A secure runtime in such an
 environment can always freeze the global setTimeout property, preventing its
 redefinition to something that could cause plan interference.

I really dislike this definition.  This would imply that anyone could
overwrite setTimeout and get a completely different behaviour.  If
overwriting is impossible then it introduces setTimeout into the
standard by the backdoor.

I'd prefer an underspecified [[QueueForProcessing]] operation with no
connection to the global object and a note to say that in a browser it
would be expected to use the same mechanism as a setTimeout with a
timeout of zero.




 We also may not agree on notification being guaranteed. At the last f2f I
 mentioned how generators in JS1.7+ have a close method, again after Python,
 but without the unnecessary GeneratorExit built-in exception object thrown
 by close at a generator in case it has yielded in a try with a finally.
 Naively supporting notification guarantees creates trivial denial of service
 attacks and accidents.
 Of course, we could say the universe ends without pending notifications
 being delivered, but in the effecttful browser, the multiverse goes on and
 there are lots of channels between 'verses ;-).

There are lots of misunderstandings around GC, where people expect
this sort of callback to happen at some predictable time.  If there's
no memory pressure then there's no reason to expect the GC to ever be
run even if the program runs for ever.  It would be nice to have some
indication in the text of the standard that discouraged people from
expecting a callback at some predictable time.  For example if people
want to close file descriptors or collect other resources that are not
memory using this mechanism it would be nice to discourage them
(because it won't work on a machine with lots of memory and not so
many max open fds).

 In general I'd like to decouple weak references from hairy execution model
 issues. If we can't do this, then the risk we'll fail to get weak refs into
 the next edition go up quite a bit. The obvious way to decouple is to
 underspecify.


 I think I'd be willing to weaken this from eventual notification to
 optional eventual notification. But I do not yet understand this issue.
 How does a guarantee of eventual notification lead to any more vulnerability
 to denial of service than while(true){} ?

 --
    Cheers,
    --MarkM

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

Re: array like objects

2009-12-08 Thread Erik Corry

2009/12/8 Mike Samuel mikesam...@gmail.com:
 It occurred to me after looking at the proxy strawman that it might
 help to nail down what array-like means in future drafts.  It isn't
 directly related to the proxy stuff though so I thought I'd start a
 separate thread.

 I've seen quite a bit of library code that does something like
   if (isArrayLike(input)) {
     // iterate over properties [0,length) in increasing order
   } else {
     // Iterate over key value pairs
   }

This looks fairly broken to me.  If the object has enumerable
properties that aren't positive integers then they don't get iterated
over just because some heuristic says it's array-like.  If the
heuristic says it's array-like then we iterate over portentially
billions of indexes even if it is very sparse.

I think there are two different questions being asked here.

1) Does the object have the special semantics around .length?  This is
potentially useful to know and the normal way to find out doesn't
work.  Instanceof is affected by setting __proto__, yet the special
.length handling persists and instanceof doesn't work for cross-iframe
objects.

2) Is it more efficient to iterate over this object with for-in or is
it more efficient (and sufficient) to iterate with a loop from 0 to
length-1?  You can't implement functions like slice properly without
this information and there's no way to get it.

 but different libraries defined array-like in different ways.
 Some ways I've seen:
    (1) input instanceof Array
    (2) Object.prototype.toString(input) === '[object Array]'
    (3) input.length === (input.length  0)
 etc.

These all look like failed attempts to answer one or both of the two
questions above.


 The common thread with array like objects is that they are meant to be
 iterated over in series.
 It might simplify library code and reduce confusion among clients of
 these libraries if there is some consistent definition of series-ness.
 This committee might want to get involved since it could affect
 discussions on a few topics:
   (1) key iteration order
   (2) generators/iterators
   (3) catchall proposals
   (4) type systems

 One strawman definition for an array like object:
    o is an array like object if o[[Get]]('length') returns a valid
 array index or one greater than the largest valid array index.

 The need to distinguish between the two in library code could be
 mooted if for (in) on arrays iterated over the array index properties
 of arrays in numeric oder order first, followed by other properties in
 insertion order, and host objects like NodeCollections followed suit.

FWIW V8 does this for both arrays and objects.

But really for..in is pretty sick for arrays.  It will convert every
single index in the array to a string.  That's not something to be
encouraged IMHO.

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Anti-pollution in ES5 by static verification (was: Addition of a global namespace function?)

2009-12-04 Thread Erik Corry

2009/12/4 Mark Miller erig...@gmail.com:
 On Fri, Dec 4, 2009 at 9:52 AM, Mark Miller erig...@gmail.com wrote:

 Given that primordials (other than the global object) are transitively
 frozen and that the above whitelist was adequately restrictive, each
 call of a closed function is fully isolated -- its connectivity to the
 world outside itself is fully under control of its caller. If the
 module-function's caller denies access to the global object, the
 indirect eval function, and to the Function constructor, then the
 module cannot pollute non-local state.

 Note that Function.prototype.constructor should either not be on the
 whitelist (and should thereby be deleted), or it should be reassigned
 to something safe during the initial clean-or-die phase. Otherwise
 (function(){}).constructor would give access to the Function
 constructor, allowing global pollution after all.

 I cannot currently find in the ES5 spec whether a conforming
 implementation may/must allow Function.prototype.constructor to be
 deleted or reassigned. Where in the spec is this dealt with?

I think you have to allow all such properties to be deleted unless
they have DontDelete.

Luckily it's not one of the magic undeletable properties in JSC and
V8: https://bugs.webkit.org/show_bug.cgi?id=25527 (ignore misleading
bug title).

-- 
Erik Corry
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Why decimal?

2009-06-24 Thread Erik Corry

2009/6/23 Brendan Eich bren...@mozilla.com

 On Jun 23, 2009, at 12:18 AM, Christian Plesner Hansen wrote:

 I've been looking around on the web for reasons why decimal arithmetic
 should be added to ES.  The most extensive page I could find was
 http://speleotrove.com/decimal/decifaq.html.  If anyone know other
 good sources of information about decimal and its inclusion in ES
 please follow up.


 Mike Cowlishaw's pages on decimal have lots of arguments for it:

 http://www2.hursley.ibm.com/decimal/decifaq.html
 http://www2.hursley.ibm.com/decimal/


I'm afraid both these links seem to have broken.


 http://www2.hursley.ibm.com/decimal/

 The most-duplicated JS bug in bugzilla.mozilla.org is

 https://bugzilla.mozilla.org/show_bug.cgi?id=5856

 Here's a typical naive JS user complaining that his computer his broken
 because it can't calculate correct differences:

 ... I typed in:

 9533.24-215.10

 … and here is the garbage Apple babbled back at me: 9318.1399

 He blamed Apple. Naive users often blame hardware for software bugs.


 The strongest argument seems to be financial: binary arithmetic is
 approximate in a way that makes it unsuitable for financial
 calculations.  I buy this argument in general -- I would definitely
 want my bank to use some form of decimal arithmetic rather than binary
 -- but less so in the context of ES.


 Do you consider that naive user's calculator example to be financial? I
 do not.

 The problem is worse for non-experts. The experts can cope.

 Anyway, decimal is not being pushed into JS at this point. At the last
 face-to-face TC39 meeting, we changed direction to explore generalizing
 value type support (including operators and literals if we can) so that
 libraries could add first class number-like types.

 Whether any new value type would be native or self-hosted, and whether it
 would be included in the core standard, are issues we want to defer until
 later, ideally until there are de-facto standards to codify.

 The counter-argument articulated at the meeting by Sam was that it's rare
 for users to download binary extensions to JS for browsers (Flash is the
 only exception, and it's not primarily a JS extension). So users won't get
 decimal unless it is part of the normative core spec, so the usability bug
 reported as Mozilla bug 5856 won't get fixed.

 I think Sam has a point; lack of a standard could be a problem. But
 whatever we do about it, the committee agreed to work on value types first.
 They're on the Harmony agenda.

 /be

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

91 matches

Mail list logo