Re: Case transformations in strings

2009-03-24 Thread Lasse R.H. Nielsen
On Tue, 24 Mar 2009 00:50:24 +0100, David-Sarah Hopwood  
david.hopw...@industrial-designers.co.uk wrote:



If converting one character to many would cause a problem with the
reference to toUpperCase in the regular expression algorithm, then
presumably Safari and Chrome would hit that problem. Do they, or
do they use different uppercase conversions for regexps vs
toUpperCase?


The Regular Expression specification in ES3 doesn't use toUpperCase  
directly, but rather the

Canonicalize helper function (15.10.2.8). It states:

 2. Let u be ch converted to upper case as if by calling  
String.prototype.toUpperCase on the one-character

string ch.
 3. If u does not consist of a single character, return ch.

I.e., it uses a different algorithm for regexps than for strings.
(It also prevents non-ASCII characters from canonicalizing to ASCII  
characters.)



If the latter, then we should allow that, and probably require it.


It's allowed, and required, already, so that's an easy fix :)

/Lasse
___
Es-discuss mailing list
Es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Targeting the ecmascript AST

2009-03-24 Thread kevin curtis
Following on from the recent discussions on ecmascript as a compiler target
and Brendan's point re targeting the AST, I have coded a user-friendly
pseudocode-ish syntax which targets the V8 ecmascript AST: 'zedscript'
aka 'zed is ecmascript for dummies'. (And given the focus of the list
i won't be posting followup's - but i hope it may be of interest to some
readers).
The code can be found at:
http://code.google.com/p/zedscript/

Zedscript runs _in_ the V8 ecmascript engine. That is, there is no
parsing zedscript source - javascript source phase. It targets the
AST directly.

The V8 scanning/parsing source code has been altered to allow the V8
engine to compile and run both ecmascript and zedscript. It reuses the V8
tokens/scanning/parsing/runtime/errorhandling machinery. This reduces
abstraction leakages and aids debugging. zedscript can call ecmascript
and vice-versa. (Calling jsfunction.toString() can be surprising however!).

A zedscript script can be run via the V8 shell:
./shell zedscriptfile

So, zedscript provides a thin layer of syntax sugar over the core
ecmascript semantics which will (hopefully):

* show the emerging rich 3.1 and 4(Harmony) semantics in the best
  possible best light.
* minimizes quirks and gotchas.
* emphasize simplify, security, safety and speed. For instance
  ecmascript 3.1 'strict mode' could be enabled by default.
* is not a port of existing languages e.g python, ruby. (Imho
  emcascript does not need 1001IronXXX ports.
  It needs one good alternate syntax - which can take inspiration
  from the sugar/syntax of other pragmatic languages - but can be
  considered a dialect of the core ecmascript semantics rather than
  a new language or port).
* hits the sweet spot between succinctness and pseudocode readability.

The goal is to track ecmascript 3.1 and 4/Harmony and deliver
zedscript 3.1 and 4 on the V8 engine. (And maybe on tracemonkey and
sfx/nitro).  It could be considered a synthesis of Brendan's goal of sugar
for es4 and Douglas Crockford's idea re a 'new' language. I feel that some
users - especially those without a comp. sci background - will never get on
with the curlies syntax. An alternate syntax in addition to the
javascript syntax
- especially for esHarmony - could really help promote ecmascript as a general
purpose scripting language. Even if something like this never gets into the
browser it would be useful for server and desktop development.

Here an example 'ztest/sample.js'. Note how the script begins with
//zed to signal to V8 it's a zedscript file.
(This is a temporary solution).

//zed
// - currently has a dylan/moo -ish syntax
// - the parens around the expression could change to
//   a more ruby/lua -ish syntax, with optional do/then.

print(*** start\n)

var x = 10
var y = 20


// if has elif clauses
// and/or are aliases for /||
if (x == -99)
print(FAIL)
elif (x == 10 and y  15)
print(OK)
elif (x == -99 or y  -99)
print(FAIL)
else
print(FAIL)
end


// not as alias for !
var b = false;
print(!b)
print(not b)

// fn as alias for function
fn times2(i)
return i * 2
end

var z = 1
while (z = 5)
print(z +  :  + times2(z++))
end

print(\n*** end );

--
___
Es-discuss mailing list
Es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


RE: Exactly where is a RegularExpressionLiteral allowed?

2009-03-24 Thread Allen Wirfs-Brock
OK, let's try to wrap up this issues.

In addition to adding RegularExpressionLiteral to Literal, do we also agree to 
delete the third paragraph of section 7 that says:

Note that contexts exist in the syntactic grammar where both a division and a 
RegularExpressionLiteral are permitted by the syntactic grammar; however, since 
the lexical grammar uses the InputElementDiv goal symbol in such cases, the 
opening slash is not recognised as starting a regular expression literal in 
such a context. As a workaround, one may enclose the regular expression literal 
in parentheses.

The second paragraph says: The InputElementDiv symbol is used in those 
syntactic grammar contexts where a division (/) or division-assignment (/=) 
operator is permitted. 
Should we insert the work initial (or leading) immediately in front of 
division to clarify where such contexts occur?

Allen


___
Es-discuss mailing list
Es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Case transformations in strings

2009-03-24 Thread David-Sarah Hopwood
Christian Plesner Hansen wrote:
 David-Sarah Hopwood wrote:
 If converting one character to many would cause a problem with the
 reference to toUpperCase in the regular expression algorithm, then
 presumably Safari and Chrome would hit that problem. Do they, or
 do they use different uppercase conversions for regexps vs
 toUpperCase?
 
 Chrome uses context (but not locale) sensitive special casing for
 ordinary toUpperCase.  For regexps it uses the same mapping but
 doesn't convert chars that map to more than one char and non-ascii
 chars that would have converted to ascii chars.  We would have liked
 to use the full multi-character mapping without the exception for
 ascii but couldn't for compatibility reasons.

Can you expand on what the compatibility problem was for
non-ASCII - ASCII mappings in regexps?

-- 
David-Sarah Hopwood ⚥

___
Es-discuss mailing list
Es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Case transformations in strings

2009-03-24 Thread David-Sarah Hopwood
David-Sarah Hopwood wrote:
 Christian Plesner Hansen wrote:
 David-Sarah Hopwood wrote:
 If converting one character to many would cause a problem with the
 reference to toUpperCase in the regular expression algorithm, then
 presumably Safari and Chrome would hit that problem. Do they, or
 do they use different uppercase conversions for regexps vs
 toUpperCase?
 Chrome uses context (but not locale) sensitive special casing for
 ordinary toUpperCase.  For regexps it uses the same mapping but
 doesn't convert chars that map to more than one char and non-ascii
 chars that would have converted to ascii chars.  We would have liked
 to use the full multi-character mapping without the exception for
 ascii but couldn't for compatibility reasons.
 
 Can you expand on what the compatibility problem was for
 non-ASCII - ASCII mappings in regexps?

Oh, never mind -- this is required by step 5 of Canonicalize in section
15.10.2.8.

So, there would be no regexp-related problems with requiring toUpperCase
to perform multi-code-unit and/or context-sensitive mappings in ES3.1.

-- 
David-Sarah Hopwood ⚥

___
Es-discuss mailing list
Es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Exactly where is a RegularExpressionLiteral allowed?

2009-03-24 Thread David-Sarah Hopwood
Waldemar Horwat wrote:
 David-Sarah Hopwood wrote:
 I'll repeat my argument here for convenience:

   A DivisionPunctuator must be preceded by an expression.
   A RegularExpressionLiteral is itself an expression.

 (This assumes that the omission of RegularExpressionLiteral from
 Literal is a bug.)

   Therefore, for there to exist syntactic contexts in which either
   a DivisionPunctuator or a RegularExpressionLiteral could occur,
   it would have to be possible for an expression to immediately
   follow [*] another expression with no intervening operator.
   The only case in which that can occur is where a semicolon is
   automatically inserted between the two expressions.
   Assume that case: then the second expression cannot begin
   with [*] a token whose first character is '/', because that
   would have been interpreted as a DivisionPunctuator, and so
   no semicolon insertion would have occurred (because semicolon
   insertion only occurs where there would otherwise have been a
   syntax error); contradiction.
 
 Yes, I verified when we were writing ES3 that this was the only case
 where the syntactic grammar permitted a / to serve as both a division
 (or division-assignment) and a regexp literal.  The interaction of
 lexing and semicolon insertion would have been unclear (how can you say
 that the next token is invalid if you don't know how to lex it?), so we
 wrote the spec to explicitly resolve those in favor of division.

If that is what the note is intended to clarify, I think its current
wording is more confusing than helpful. It certainly confused me.
Anyway, there is no case in which a regexp needs to be parenthesized
to avoid lexical ambiguity.

How about replacing the current wording by something that specifically
discusses the semicolon insertion issue, with an example:

  There are two goal symbols for the lexical grammar. The InputElementDiv
  symbol is used in those syntactic grammar contexts where a leading
  division (/) or division-assignment (/=) operator is permitted. The
  InputElementRegExp symbol is used in other syntactic grammar contexts.

  NOTE
  There are no syntactic grammar contexts where both a leading division
  or division-assignment, and a leading RegularExpressionLiteral are
  permitted. This is not affected by semicolon insertion (section 7.9);
  in examples such as the following:

a = b
/hi/g.exec(c).map(d);

  where the first non-whitespace, non-comment character after a
  LineTerminator is '/' and the syntactic context allows division or
  division-assignment, no semicolon is inserted at the LineTerminator.
  That is, this example is interpreted in the same way as:

a = b / hi / g.exec(c).map(d);

-- 
David-Sarah Hopwood ⚥

___
Es-discuss mailing list
Es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss