Re: JavaScript 2015?

2015-01-26 Thread Bjoern Hoehrmann
* Axel Rauschmayer wrote:
I’m in the process of coming up with a good title for a book on 
ECMAScript 6. That begs the question: What is the best way to refer to 
ECMAScript 6?

1. The obvious choices: ECMAScript 6 or ES6.
2. Suggested by Allen [1]: JavaScript 2015.

The advantage of #2 is that many people don’t know what ECMAScript 6 is. 
However, I’m worried that a book that has “2015” in its title will 
appear old in 2016.

Well, Microsoft Office 1997 came out in 1996, Office 2000 in 1999... So,
JavaScript 2016 would be a better title for marketing purposes. There
is also the option leap ahead a bit further with JavaScript 3000, but
Python tried that already. Over in the lands of Perl 5 Modern Perl is
the catchphrase booktitle, but that seems to be taken for JavaScript. It
would also be possible to take a clue from the browser vendors and make
it a BoD or e-book offering and increase the version number ever six
weeks or so (clearly justified by folding in errata). Another option is
to make reference to the past, like Post-Snowden JavaScript or better
perhaps JavaScript after Snowden. Might make for a good setup to talk
about OO-design, classes, information hiding, and so on...
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
D-10243 Berlin · PGP Pub. KeyID: 0xA4357E78 · http://www.bjoernsworld.de
 Available for hire in Berlin (early 2015)  · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Figuring out the behavior of WindowProxy in the face of non-configurable properties

2015-01-14 Thread Bjoern Hoehrmann
* Boris Zbarsky wrote:
You say every WindowProxy, but in practice in an ES implementation you 
have some object, it has some internal methods.  This is the last time 
I'm bothering to go through this with you, since clearly we're getting 
nowhere, as I said in https://www.w3.org/Bugs/Public/show_bug.cgi?id=27128

What are the odds that the behavior observable by web pages can actually
be defined sanely such that ES invariants and compatibility requirements
are satisfied? https://www.w3.org/Bugs/Public/show_bug.cgi?id=27128#c15
indicates, as I understand it, the odds may be quite good. In that case,
looking for a volunteer to come up with a proposal might be a good next
step.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
D-10243 Berlin · PGP Pub. KeyID: 0xA4357E78 · http://www.bjoernsworld.de
 Available for hire in Berlin (early 2015)  · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Name of WeakMap

2013-12-16 Thread Bjoern Hoehrmann
* Erik Arvidsson wrote:
At the last f2f2 we talked about renaming WeakMap to SideTable. We
postponed the discussion, saying that we would get back to it later. We
never did.

I would like us to keep the name WeakMap as it is. We didn't really take
WeakSet into account. If we rename WeakMap we would need to rename WeakSet
too and I like the current Map/Set analogy.

(The name SideTable makes me think I seriously need to re-evaluate
what `WeakMap` is supposed to be.)
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] Response to Statement from W3C TAG

2013-12-10 Thread Bjoern Hoehrmann
* Allen Wirfs-Brock wrote:
On Dec 9, 2013, at 5:40 PM, Bjoern Hoehrmann wrote:
 If TC39 said ECMA-404 is going to be replaced by a verbatim copy of the
 ABNF grammar in draft-ietf-json-rfc4627bis-08 with pretty much no other
 discussion of JSON and a clear indication that future editions will not
 add such discussion, and will not change the grammar without IETF con-
 sensus, I would be willing to entertain the idea of making ECMA-404 a
 normative reference.

The second paragraph is speaking about the language described by the 
grammar, not the actual formalism used to express the grammar. I'm quite 
sure that there is no interest at all within TC39 to ever change the 
actual JSON language.  If you are looking for some sort of contractual 
commitment from ECMA, I suspect you are wasting your time. Does the IETF 
make such commitments?

As you know, the charter of the JSON Working Group says

  The resulting document will be jointly published as an RFC and by
  ECMA. ECMA participants will be participating in the working group 
  editing through the normal process of working group participation.  
  The responsible AD will coordinate the approval process with ECMA so 
  that the versions of the document that are approved by each body are 
  the same.

If things had gone according to plan, it seems likely that Ecma would
have requested the IANA registration for application/json jointly lists
the IETF and Ecma International has holding Change Control over it, and
it seems unlikely there would have been much disagreement about that.

It is normal to award change control to other organisations, for
instance, RFC 3023 gives change control for the XML media types to the
W3C. I can look up examples for jointly held change control if that
would help.

And no, I am not looking for an enforceable contract, just a clear
formal decision and statement.

This doesn't mean that TC39 would necessarily agree to eliminate the 
Syntax Diagrams,  or that we wouldn't carefully audit any grammar 
contribution to make sure that it is describing the same language.  
There may also be minor issues that need to be resolved. But we seem to 
agree that we already are both accurately describing the same language 
so this is really about notational agreement.

Having non-normative syntax diagrams in addition to the ABNF grammar
would be fine if they can automatically be generated from the ABNF.

I was talking about removing most of the prose, leaving only boiler-
plate, a very short introduction, and references. Then it would be a
specification of only the syntax and most technical concerns would be
addressed on both sides. If you see this as a viable way forward, then
I think the JSON WG should explore this option further.

As a base line, ECMA-404 was created in less than a week.  It takes a 
couple months to push through a letter ballot to above a revised 
standard. 

The RFC4627bis draft could be approved and be held for normatives re-
ferences to materialise; this is not uncommon for IETF standards. It
usually takes a couple of months for the RFC editor to process the
document anyway, so personally a couple of months of waiting for a
revised edition of ECMA-404 would be okay with me.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] Response to Statement from W3C TAG

2013-12-09 Thread Bjoern Hoehrmann
* Allen Wirfs-Brock wrote:
This whole issue of the use of Syntax Diagrams rather than BNF is a 
stylist debate that is hard to take seriously. If TC39 informed you that 
we are converting the notation used in ECMA-404 to a BNF formalism would 
that end the objections  to normatively referencing  ECMA-404 from 
4627bis?  Unfortunately, I'm pretty sure it wouldn't.

If TC39 said ECMA-404 is going to be replaced by a verbatim copy of the
ABNF grammar in draft-ietf-json-rfc4627bis-08 with pretty much no other
discussion of JSON and a clear indication that future editions will not
add such discussion, and will not change the grammar without IETF con-
sensus, I would be willing to entertain the idea of making ECMA-404 a
normative reference.

How soon would TC39 be able to make such a decision and publish a re-
vised edition of ECMA-404 as described above?
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] Response to Statement from W3C TAG

2013-12-08 Thread Bjoern Hoehrmann
* Martin J. Dürst wrote:
The textual descriptions are in some cases quite precise, but in some 
other cases, leave quite a bit of ambiguity. And stuff like It may have 
an exponent of ten, prefixed by e (U+0065) or E (U+0045) and optionally 
+ (U+002B) or – (U+002D). (in particlar the first clause of that 
sentence) doesn't make much sense. If e.g. 1.2 has an exponent of 10, 
it's going to be 6.1917 or so, not at all what this notation is usually 
used for.

Apparently in `x²` 2 is an exponent of x. That does not make much
sense to me either, but it does appear to be a common english idiom.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] Response to Statement from W3C TAG

2013-12-08 Thread Bjoern Hoehrmann
* Allen Wirfs-Brock wrote:
start JSON text-
{
allenwb:  there is an objectively observable order to the members of a JSON 
object,
JSON WG participant 1:  It would be insane to depend upon that ordering,
allenwb:  not if there is agreement between a producer and consumer on the 
meaning of the ordering,
JSON WG participant 2:  But JSON.parse and similar language bindings don't 
preserve order,
allenwb:  A streaming JSON parser would naturally preserve member order,
JSON WG participant 2: I din't think there are any such parsers,
allenwb: But someone might decide to create one, and if they do it will 
expose object members, in order,
allenwb: Plus, in this particular case the schema is so simple the 
application developer might well design to write a custom, schema specific 
streaming parser.
}
---end JSON text---

There is observable white space outside strings in JSON texts. It would
be insane to depend on the placement of white space outside strings. Not
if there is agreement on the meaning of that white space. Most parsers
do not preserve such white space. A generic ABNF parser would naturally
preserve it...

It is quite possible that there are steganographic or cryptographic pro-
tocols that use insignificant white space in JSON texts as subtle form
of communication or for integrity protection, just like they might use
order of object members for the same purpose.

However, what we are discussing here is what people should assume when
we say We use JSON! so there do not have to be detailed negotiations
to establish agreements, i.e., a Standard. And people should very much
assume that the ordering of object members is as insignificant as the
placement of white space outside strings.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] Response to Statement from W3C TAG

2013-12-07 Thread Bjoern Hoehrmann
* Allen Wirfs-Brock wrote:
On Dec 7, 2013, at 3:55 AM, Nico Williams wrote:
 However, if a schema is also to be allowed to treat them as distinct
 then the *meta-schema* must treat them as distinct.  I.e., no matter
 what generic programming language bindings of JSON one users, the above
 two JSON texts must produce equivalent results when parsed!

Equivalent according to what definition?

I suspect intended was must not produce.

And I do care about the semantic issues.  They just don't belong in a 
syntactic level specification of the JSON format such as ECMA-404. A 
problem I see with the RFC4627bis is that it conflates a syntactic level 
specification with a just little bit of semantic data model. It is 
neither a pure syntactic specification nor a complete data model.

  JSON_texts = { x | x is a JSON text }

  JSON_diffs = { (a,b) | a and b are elements of JSON_texts and
 a is significantly different from b }

A pure specification in your sense above defines only membership in the
`JSON_texts` set. ECMA-404 is not pure in this sense because it defines
that e.g. `([], [ ])` is not a member of `JSON_diffs`. 

ECMA-404 does not define that

  ('{x:1,y:2}', '{y:2,x:1}')

is not a member of `JSON_diffs`. Right? It says the white space in the
example is insignificant, but it does not say order of key-value-pairs
in objects is insignificant. Carsten Bormann gave other examples like
ECMA-404's definition of equivalent escape sequences.

Readers of ECMA-404 might assume that it gives a complete description
of what people developing and operating JSON-based systems agree are
significant differences. They might build systems that rely on the order
of key-value-pairs in objects because of this, for instance

  http://wiki.apache.org/solr/UpdateJSON#Solr_3.1_Example

Systems like ecmascript's `JSON.stringify` API cannot ordinarily create
such JSON texts and would be unable to interact with such a system. That
is something the IETF JSON Working Group wishes to avoid, accordingly
they provide a more complete definition of the `JSON_diffs` equivalence
relation that better reflects rough consensus and running code of the
JSON community.

I believe the combination of impurity and incompleteness in ECMA-404 is
harmful to the JSON community.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] Response to Statement from W3C TAG

2013-12-07 Thread Bjoern Hoehrmann
* Allen Wirfs-Brock wrote:
On Dec 7, 2013, at 4:55 PM, John Cowan wrote:
 Allen Wirfs-Brock scripsit:
 
 Similarly, the JSON texts:
   {1:  1, 2, 2}
 and
   {2:  2, 1: 1}
 
 or the JSON texts:
   {a: 1, a: 2}
 and
   {a: 2, a: 1}
 
 have an ordering of the object members that must be preserved by the
 parser in order for downstream semantics to be applied.
 
 I cannot accept this statement without proof.  Where in the ECMAscript
 definition does it say this?

In other words, ECMA-262 explicitly specifies that when multiple 
occurrences of the same member name occurs in a JSON object, the value 
associated with the last (right-most) occurrence is used. Order matters.

A similar analysis applies to the first example.  

Your analysis does not demonstrate that `JSON.parse` preserves ordering.
I am confident that even in the current ES6 draft `JSON.stringify` does
not preserve ordering even if `JSON.parse` somehow did. It's based on
`Object.keys` which does not define ordering as currently proposed. If
you can re-create the key-value-pair order in your first example from
the output of `JSON.parse` without depending on implementation-defined
behavior, seeing the code for that would be most instructive.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

2013-11-26 Thread Bjoern Hoehrmann
* Nico Williams wrote:
We must not require encoding detection functionality in parsers.  We
must not forbid it either.  We might need to say that encodings other
than UTF-8/16/32 may not be reliably detected, therefore they are highly
discouraged, even forbidden except where protocols specifically call for
them.

When I pass a fully conforming UTF-8 encoded application/json entity to
a fully conforming JSON parser I do not want the parser to do something
funny like interpreting the document as if it were Windows-1252 encoded.
I am amazed how many people here think a parser that does that should
not be considered broken.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

2013-11-26 Thread Bjoern Hoehrmann
* Nico Williams wrote:
On Tue, Nov 26, 2013 at 09:15:38PM +0100, Bjoern Hoehrmann wrote:
 * Nico Williams wrote:
 We must not require encoding detection functionality in parsers.  We
 must not forbid it either.  We might need to say that encodings other
 than UTF-8/16/32 may not be reliably detected, therefore they are highly
 discouraged, even forbidden except where protocols specifically call for
 them.
 
 When I pass a fully conforming UTF-8 encoded application/json entity to
 a fully conforming JSON parser I do not want the parser to do something
 funny like interpreting the document as if it were Windows-1252 encoded.
 I am amazed how many people here think a parser that does that should
 not be considered broken.

You missed the point.

We must require encoding detection functionality in parsers. We must
forbid encoding detection functionality beyond that. We must say that
encodings other than UTF-8/16/32 are forbidden in any and all cases.
is how I would modify what you said above (with some caveats).

Note that I am talking about labeled sequences of octets, application/
json entities, not paintings on a cave wall that look similar to JSON
text in a strange font. In a labeled sequence of octets I can tell for
sure whether there are invisible characters in it if I know the en-
coding.

There are two forms to consider. One is the labeled sequence of octets
that we call application/json entity. The other is a sequence of Uni-
code scalar values. That is the alphabet of the ABNF grammar in the
specification. If you have anything else, then the specification does
not apply to your situation.

If you wanted to forbid non-Unicode, non-UTF encodings, then you'd be
preventing such a shell, and for what reason?  If you only mean that
auto-detection of encoding should not even be mentioned, I'm fine with
that, and I've already said so earlier.

Above I said that there are two forms to consider. Encoding detection
is what allows us to convert the application/json entity form into
the sequence of Unicode scalar values form. We need the latter form
in order to apply the ABNF grammar. Imagine you receive this:

  HTTP/1.1 200 OK
  Content-Type: application/json
  ...

  ABCD...

There would be at least two specifications that apply here, the HTTP
and the application/json specification. Would you like them to say
that you are on your own, ABCD... could mean anything? I would like
them to say ABCD... is an array with three times the integer zero,
like `[0,0,0]`. I can build robust software based on that.

I cannot build robust software based on well, maybe it's EBCDIC?
Have you tried GB 18030? UTF-7 might be worth a try otherwise. Are
you sure this matters at all?
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] BOMs

2013-11-22 Thread Bjoern Hoehrmann
* Matt Miller (mamille2) wrote:
There does not appear to be any consensus on explicitly allowing or 
disallowing of a Byte Order Mark (BOM).  Neither RFC4627 nor the current 
draft mention BOM anywhere, and the modus operandi of the JSON Working 
Group has been to leave text unchanged unless there was wide support.

To be clear, that means application/json entities that start with a byte
sequence that matches U+FEFF encoded in UTF-8/16/32 is malformed because
the ABNF does not allow a U+FEFF at that position (and interpreting such
a sequence as anything other than ordinary character data requires
explicit specification). I do think an informational note saying as much
could be useful.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

2013-11-22 Thread Bjoern Hoehrmann
* Matt Miller (mamille2) wrote:
There does seem to be rough consensus that using an encoding other than 
UTF-8 can have interoperability issues.  The also seems to be rough 
consensus that the current text and table in section 8.1 for detecting 
the encoding will be inaccurate (and potentially harmful).

That appears to mean the approach with the most consensus is to remove
the encoding detection entirely, leaving only:


   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.


Neither of the quoted statements mean anything as far as I can tell.
The encoding detection rules are a vital part of the specification and
cannot be removed without replacement. I am not aware of any argument
that the text will be inaccurate or harmful.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] BOMs

2013-11-21 Thread Bjoern Hoehrmann
* Allen Wirfs-Brock wrote:
On Nov 21, 2013, at 5:28 AM, Henri Sivonen wrote:
 On Thu, Nov 21, 2013 at 7:53 AM, Allen Wirfs-Brock
 al...@wirfs-brock.com wrote:
 Just to be clear about this.  My tests directly tested JavaScript built-in
 JSON parsers WRT to BOM support in three major browsers.  The tests directly
 invoked the built-in JSON.parse functions and directly passed to them a
 source strings that was explicitly constructed to contain a BOM code point .

 It would be surprising if JSON.parse() accepted a BOM, since it
 doesn't take bytes as input.

ECMAScript's JSON.parse accepts an ECMAScript string value as its input.
ECMAScript strings are sequences of 16-bit values.  JSON.parse (and most
other ECMAScript functions) interpret those values  as Unicode code 
units.  The value U+FEFF can appear at any position within a string. 
When defining a string as an ECMAScript literal, a sequence like \ufeff 
is an escape sequence that means place the code unit value 0xefff into 
the string at this position in the sequence. Also note that the actual 
strings passed below to JSON.parse contain the actual code point value 
U+FEFF not the escape sequence that was used to express it.  To include 
the actual escape sequence characters in the string it would have to be 
expressed as '\\feff'.

A byte order mark indicates the order of bytes in a sequence of bytes.
An ecmascript string is not a sequence of bytes and therefore it cannot
have a byte order mark inside it. Your test is not for BOM support but
for an egregious semantic error in the implementation of JSON.parse.

  http://shadowregistry.org/js/misc/#t2ea25a961255bb1202da9497a1942e09

That is a similar test. It makes Firefox see UTF-8 BOMs in ecmascript
strings. Firefox is not supposed to look for UTF-8 BOMs in ecmascript
strings because ecmascript strings are not sequences of bytes at that
level of reasoning.

Is there any chance, by the way, to change `JSON.stringify` so it does
not output strings that cannot be encoded using UTF-8? Specifically,

  JSON.stringify(JSON.parse(\\uD800\))

would need to escape the surrogate instead of emitting it literally.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] BOMs

2013-11-21 Thread Bjoern Hoehrmann
* John Cowan wrote:
Bjoern Hoehrmann scripsit:

 Is there any chance, by the way, to change `JSON.stringify` so it does
 not output strings that cannot be encoded using UTF-8? Specifically,
 
   JSON.stringify(JSON.parse(\\uD800\))
 
 would need to escape the surrogate instead of emitting it literally.

No, there isn't.  We've been down this road repeatedly.  People can and
do use JSON strings to encode arbitrary sequences of unsigned 16-bit integers.

The output of JSON.stringify(\uD800) contains no backslash character,
if you call `utf8_encode(JSON.stringify(\uD800))` you get an exception
because UTF-8 cannot encode the lone surrogate and `utf8_encode` does
not know it could encode it as `\uD800` without loss of information. If
`JSON.stringify` produced an escape sequence instead, there would be no
problem passing the output to `utf8_encode`.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] BOMs

2013-11-19 Thread Bjoern Hoehrmann
* Tatu Saloranta wrote:
Dominant Java implementations support UTF-16 with BOM; either directly or
through Java's Reader implementations that handle BOMs.
String concatenation case seems irrelevant, since BOMs are not included in
in-memory representation anyway, as opposed to byte stream serialization.

HTTP implementations cannot correctly determine whether an entity body
is text in a single character encoding and if so what that encoding is,
accordingly the dominant API deals in byte[] arrays, not text Strings;
furthermore, many programming languages default to byte[] arrays for
string literals. That often combines into forms of

  byte[] json = sprintf('{x: %s, y: %s}', GET(...), GET(...));

which works fine if all three byte[] arrays are UTF-8 encoded and use
no Unicode signature, which is the case 99% of the time.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: BOMs

2013-11-18 Thread Bjoern Hoehrmann
* Martin J. Dürst wrote:
As for what to say about whether to accept BOMs or not, I'd really want 
to know what the various existing parsers do. If they accept BOMs, then 
we can say they should accept BOMs. If they don't accept BOMs, then we 
should say that they don't.

Unicode signatures are not useful for application/json resources and are
likely to break exisiting and future code, it is not at all uncommon to
construct JSON text by concatenating, say, string literals with some web
service response without passing the data through a JSON parser. And as
RFC 4627 makes no mention of them, there is little reason to think that
implementations tolerate them.

Perl's JSON module gives me

  malformed JSON string, neither array, object, number, string
  or atom, at character offset 0 (before \x{ef}\x{bb}\x{bf}[])

Python's json module gives me

  ValueError: No JSON object could be decoded

Go's encoding/json module gives me

  invalid character 'ï' looking for beginning of value

http://shadowregistry.org/js/misc/#t2ea25a961255bb1202da9497a1942e09 is
another example of what kinds of bugs await us if we were to specify the
use of Unicode signatures for JSON, essentially

  new DOMParser().parseFromString(\uBBEF\u3CBF\u7979\u3E2F,text/xml)

Now U+BBEF U+3CBF U+7979 U+3E2F is not an XML document but Firefox and
Internet Explorer treat it as if it were equivalent to yy/.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: BOMs

2013-11-18 Thread Bjoern Hoehrmann
* Henry S. Thompson wrote:
I'm curious to know what level you're invoking the parser at.  As
implied by my previous post about the Python 'requests' package, it
handles application/json resources by stripping any initial BOM it
finds -- you can try this with

 import requests
 r=requests.get(http://www.ltg.ed.ac.uk/ov-test/b16le.json;)
 r.json()

The Perl code was

  perl -MJSON -MEncode -e
my $s = encode_utf8(chr 0xFEFF) . '[]'; JSON-new-decode($s)

The Python code was

  import json
  json.loads(u\uFEFF[].encode('utf-8'))

The Go code was

  package main
  
  import encoding/json
  import fmt
  
  func main() {
r := \uFEFF[]
  
var f interface{}
err := json.Unmarshal([]byte(r), f)

fmt.Println(err)
  }

In other words, always passing a UTF-8 encoded byte string to the byte
string parsing part of the JSON implementation. RFC 4627 is the only
specification for the application/json on-the-wire format and it does
not mention anything about Unicode signatures. Looking for certain byte
sequences at the beginning and treating them as a Unicode signature is
the same as looking for `/* ... */` and treating it as a comment.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: es-discuss Digest, Vol 81, Issue 82

2013-11-18 Thread Bjoern Hoehrmann
* mn...@google.com wrote:
The first four bytes are:

   00 00 00 22  UTF-32BE
   00 22 E5 65  UTF-16BE
   22 00 00 00  UTF-32LE
   22 00 65 E5  UTF-16LE
   22 E6 97 A5  UTF-8

The UTF-16 bytes don't match the patterns in RFC, so UTF-16 streams would
(wrongly) be detected as UTF-8, if one strictly follows the RFC.

RFC 4627 does not allow string literals at the top level.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: ArrayClass should imply @@isConcatSpreadable

2013-10-29 Thread Bjoern Hoehrmann
* Allen Wirfs-Brock wrote:
On Oct 28, 2013, at 5:52 PM, Domenic Denicola wrote:

 From: Allen Wirfs-Brock [mailto:al...@wirfs-brock.com]
 
 So what's so onerous about returning a fresh array from the getter each 
 time it was called.
 
 The fact that `api.property !== api.property`.

You mean people want to do identity checks of the value of the property? Why?

It might be helpful to look at it the other way around. If you spend a
day chasing down a bug that boils down to the problem above triggered in
some subtle way, would you say that the issue should have been caught in
code review and there should have been tests for issues like this? Would
you say it's safe to change a popular API from returning the same object
to returning a different object every time?

It is not diffcult to end up with code that does something like this:

  var temp = a.example;
  ...
  if (...) {
...
temp = b.example;
  }
  ...
  if (temp === a.example) {
...
  } else /* temp === b.example */ {
...
  }

which works fine so long as `a.example` returns the same object; if it
does not, this would always take the `else` branch, but that is hard to
spot and counter-intuitive considering such odd behavior is very rare.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Working with grapheme clusters

2013-10-26 Thread Bjoern Hoehrmann
* Norbert Lindenberg wrote:
On Oct 25, 2013, at 18:35 , Jason Orendorff jason.orendo...@gmail.com wrote:

 UTF-16 is designed so that you can search based on code units
 alone, without computing boundaries. RegExp searches fall in this
 category.

Not if the RegExp is case insensitive, or uses a character class, or ., or a
quantifier - these all require looking at code points rather than UTF-16 code
units in order to support the full Unicode character set.

If you have a regular expression over an alphabet like Unicode scalar
values it is easy to turn it into an equivalent regular expression over
an alphabet like UTF-16 code units. I have written a Perl module that
does it for UTF-8, http://search.cpan.org/dist/Unicode-SetAutomaton/;
Russ Cox's http://swtch.com/~rsc/regexp/regexp3.html#step3 is a popular
implementation. In effect it is still as though the implementation used
Unicode scalar values, but that would be true of any implementation. It
is much harder to implement something like this for other encodings like
UTF-7 and Punycode.

It is useful to keep in mind features like character classes are just
syntactic sugar and can be decomposed into regular expression primitives
like a choice listing each member of the character class as literal. The
`.` is just a large character class, and flags like //i just transform
parts of an expression where /a/i becomes something more like /a|A/.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Working with grapheme clusters

2013-10-26 Thread Bjoern Hoehrmann
* Claude Pache wrote:
You might know that the following ES expressions are broken:

   text.charAt(0) // get the first character of the text
   text.length  100 ? text.substring(0,100) + '...' : text // cut the 
 text after 100 characters

The reason is *not* because ES works with UTF-16 code units instead of 
Unicode code points (it's just a red herring!), but because _graphemes_ 
(that is, what a human perceives as a character) may span multiple 
code units and/or code points.

The example is deceptively simple. Truncating a string is a hard problem
and a high quality implementation would probably be language-specific to
avoid problematic truncations like when a suffix changes the meaning of
a prefix; it would also take special characters into account, say you do
not want the last character before the ... to be an open quote mark,
and if the string is 101 characters ending in ... turning that into a
string of 103 characters ending in . would also be silly.

Another issue that is often ignored is that you might want to use the
truncated text in combination with other text, say in a HTML document
with a more or permalink or some such link after it. Something like

  pABC #x202E; DEF #x202C; GHI a href='...'more/a/p
  pABC #x202E; DEF ...  a href='...'more/a/p

The second paragraph will render ABC erom ... FED because the control
character that restores the bidirectional text state got lost when the
string was truncated. These are all issues that counting graphemes in-
stead of 16 bit units does not address and it is not clear to me that it
would actually be an improvement.

User-perceived character is not an intuitive notion especially once
you leave the realm of letters from a familiar script. In a string that
contains 1 user-perceived character, what is the maximum number of zero-
width spaces in that string? The maximum number of scalar values? What
is the maximum width and maximum height of such a string when rendered,
the maximum number of UTF-8 bytes needed to encode such a string? Should
one perceive a horizontal ellipsis as three characters, or is it just
one? How many are two thin spaces?

My smartphone comes with a News application that displays the latest
headlines from various news sources and links to corresponding articles.
If you use it for a day or two you will notice that it's not of German
design, but one for a language that uses fewer or narrower grapheme
clusters per unit of information if you will. Many of the headlines do
not convey what the article might be about. A current example is 'Code-
name Lustre - Frankreich liefert' which is roughly 'code name lustre
- France supplies' ... what? What does France supply? Or Dortmund droht
historische Pleite im roughly Dortmund faces historic ... in where
Pleite could be bankruptcy, defeat, failure, ... could be sport,
could be finance, can't tell.

That makes the application rather frustrating to use with german news. I
imagine it works better with english headlines which tend to use fewer
grapheme clusters. So truncating news headlines after a certain number
of grapheme clusters untailored to the specific script and language is
not the right design choice. Actually, it might be truncated by pixel
measures because there is a visual space to fit, but english and german
are very similar in their pixels per grapheme cluster metrics...

So it seems rather unlikely for someone to say so we need the first 100
extended grapheme clusters as defined in UAX #29 of the string and then
someone responding yes, that is clearly the right solution.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Working with grapheme clusters

2013-10-24 Thread Bjoern Hoehrmann
* Mathias Bynens wrote:
Out of curiosity, is there any programming language that operates on 
grapheme clusters (rather than code points) by default? FWIW, code point 
iteration is what I’d expect in any language.

It is the specified default for Perl 6 that can be modified through
lexically scoped pragmas. I do not know the state of implementation.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Bjoern Hoehrmann
* Allen Wirfs-Brock wrote:
The utility of a hypothetical 'at' method is presumably exactly that of 
'codePointAt'. 

   str.at(p)
would just be a convenience  for expressing
   String.fromCodePoint(str.codePointAt(p))

So the real question is probably, how common is that  use case.

Certainly not common enough to warrant a two-character method on the
native string type. Odds are people will use it incorrectly in an
attempt to make their code look concise, not understanding that it'll
retrieve a substring of .length 1 or 2, possibly consisting of a lone
surrogate, based on a 16 bit index that might fall in the middle of a
character; the problematic cases are fairly rare, so it's hard to
notice improper use of `.at` in automated testing or in code review.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Bjoern Hoehrmann
* Mathias Bynens wrote:
On 19 Oct 2013, at 12:15, Bjoern Hoehrmann derhoe...@gmx.net wrote:

 Certainly not common enough to warrant a two-character method on the
 native string type. Odds are people will use it incorrectly in an
 attempt to make their code look concise […]

Are you saying that changing the name to something that is longer than 
`at` would solve this problem?

If it was `.getOneOrTwoCodepointLongSubstringAtUcs2CodeUnitIndex(...)`
I am sure people would be reluctant using it because it's unreasonably
long compared to `String.fromCodePoint(str.codePointAt(p))` and harder
to understand than the combination of those two primitives.

 […] not understanding that it'll retrieve a substring of .length 1 or 2,
 possibly consisting of a lone surrogate, based on a 16 bit index that
 might fall in the middle of a character; the problematic cases are
 fairly rare, so it's hard to notice improper use of `.at` in automated
 testing or in code review.

People are using `String.prototype.charAt()` incorrectly too, expecting
it to return whole symbols instead of surrogate halves wherever possible.
How would _not_ introducing a method that avoids this problem help?

Right now people do not have much of a choice other than writing code
that does not do the right thing when faced with malformed strings or
non-BMP characters, it's unreasonable to call a method like `substr`
and then manually smooth it up around the edges and perhaps scan the
interior for lone surrogates to ensure that at least your code doesn't
do the wrong thing. That gives you well-known bad code, which is a
good thing to have, better than more complicated code that might have
unknown bugs. Allen's loop `for (let p=0; pstr.length; p+=c.length)`
for instance is just waiting for someone to improve or replace it with
code that increments by `1` instead of `.length` because that's simpler.

The methods `fromCodePoint` and `codePointAt` can be used to get ugly
constants out of code that tries to do the right thing, and they will
offer some insight into how developers might go from UCS-only code to
something more proper, but for the moment duplicating all the UCS-based
methods strikes me as premature, especially when giving them seductive
names. How would a somewhat-surrogate-aware `substring` method work and
what would it be called, for instance? If it is omitted, we would be
back to square one, someone in need of substring functionality has to
jump through overly complicated hoops to make it work correctly and
ends up mixing surrogate-pair-aware with -unaware code.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] FYI ECMA, W3C, IETF coordination on JSON

2013-10-09 Thread Bjoern Hoehrmann
* Allen Wirfs-Brock wrote:
As far as far as I know, nobody has suggested that TC39 should issues a 
standard relating to this encoding level or concerning the JSON MIME 
type.  This seems like an appropriate subject area for the IETF.

Per http://www.ietf.org/mail-archive/web/json/current/msg00267.html ECMA
would like to jointly publish the document the IETF JSON WG is making,
and the IETF JSON WG is explicitly chartered towards that end, quoting
http://tools.ietf.org/wg/json/charters?item=charter-json-2013-05-31.txt:

  The resulting document will be jointly published as an RFC and by 
  ECMA. ECMA participants will be participating in the working group 
  editing through the normal process of working group participation.  
  The responsible AD will coordinate the approval process with ECMA
  so that the versions of the document that are approved by each body 
  are the same.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: On `String.prototype.codePointAt` and `String.fromCodePoint`

2013-09-25 Thread Bjoern Hoehrmann
* Anne van Kesteren wrote:
I think I'm convinced that String.fromCodePoint()'s design is correct,
especially since the rendering subsystem deals with code points too.
String.prototype.codePointAt() however still feels wrong since you
always need to iterate from the start to get the correct code *unit*
offset anyway so why would you use it rather than the code *point*
iterator that is planned for inclusion?

UTF-16 is a self-synchronizing code and you need to move at most one
`.length` unit to get to a proper `.codePointAt` index in a properly
formed string. You only need to start from the beginning if you care
about what is between the start and the given index position. If you
want to treat proper surrogate pairs as one unit for counting, then
`.codePointAt` let's you do

  while (ix  s.length) {
ix += s.codePointAt(ix)  0x;
ix += 1;
  }

That perhaps also illustrates why making the method return a replace-
ment character for unpaired surrogates is a bad idea: you may violate

  count_unicode(s1 + s2) === count_unicode(s1) + count_unicode(s2)

if this concatenates two halfs of a surrogate pair. The `.codePointAt`
method is for random indexing, iterators are for sequential access.
Random indexing into strings is rare except for a few special positions,
but it happens through user input for instance (give me the Unicode
scalar value of the first character of the current text selection).
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code points vs Unicode scalar values

2013-09-22 Thread Bjoern Hoehrmann
* Anne van Kesteren wrote:
ES6 introduces String.prototype.codePointAt() and
String.codePointFrom() as well as an iterator (not defined). It struck
me this is the only place in the platform where we'd expose code point
as a concept to developers.

Nowadays strings are either 16-bit code units (JavaScript, DOM, etc.)
or Unicode scalar values (anytime you hit the network and use utf-8).

I'm not sure I'm a big fan of having all three concepts around. We
could have String.prototype.unicodeAt() and String.unicodeFrom()
instead, and have them translate lone surrogates into U+FFFD. Lone
surrogates are a bug and I don't see a reason to expose them in more
places than just the 16-bit code units.

I would regard that as silent data corruption which has the odd habit of
causing hazardous anomalies in code and makes reasoning about it harder.

This is akin to adding edge cases. There are many desirable properties a
function or its implementation can have, like purity and idempotence or
reflexivity. When functions and relations have such properties 99.99% of
the time, people tend to write code as if it had them without exception.

An example I came across today is this:

  var parsed = JSON.parse(-0);
  1 / parsed === -Infinity; // true
  1 / JSON.parse(JSON.stringify(parsed)) === -Infinity; // false

That is, JSON.parse preserves negative zero, but JSON.stringify does
not. Well, in Firefox and Webkit; in Opera 12.x both comparisons are
false. If JSON.stringify did not silently corrupt negative zero into
positive zero, we would probably have one less bug to contend with.

If you look at `codePointAt` over the domain of strings of .length 1 at
the first position, then it is injective, in fact, it's the identity
function. And if you apply the `fromCodePoint` method to the output of
`codePointAt` in this case, the data roundtrips nicely. If instead the
functions would silently corrupt data, if `codePointAt` returned 0xFFFD
when the input was 0xFFFD but also when hitting a lone surrogate, these
properties would be lost.

Relatedly, if `codePointAt` would throw an exception when hitting a lone
surrogate, you may very well end up with a bug that breaks your whole
application because someone accidentally put an emoji character at the
wrong position in a string in a database and there is some unfortunate
freak combination of code unit oriented API calls, like .substring or a
regular expression, that splits the emoji in the middle. Returning an
error code, like a negative number or undefined, might have the same
effect, depending on what happens if you pass those values to other
string-related functions.

Note that emitting a replacement character when encountering character
encoding errors in bitstreams is a well-known form of hazardous silent
data corruption and systems that require integrity forbid doing that. As
an example, the WebSocket protocol requires implementations to consider
a WebSocket connection fatally broken upon encountering a malfored UTF-8
sequence in a text frame. That is the right thing to do because when the
sender of those bytes sends the wrong bytes, it may also send the wrong
byte count, meaning payload data might be misinterpreted as frame and
message meta data (useful only to attackers); and on the receiving end,
emitting replacement characters might change the byte length of a
string, but some code accidentally uses the unmodified byte length in
further processing which quickly leads to memory corruption bugs, which
are very bad.

Unfortunately ecmascript makes it very difficult to ensure you do not
generate strings with unpaired surrogate code points somewhere in them,
it's as easy as taking the first 157 .length units from a string and
perhaps appending ... to abbreviate it. And it's a freak accident if
that actually happens in practise because non-BMP characters are rare.
We should be very reluctant to introduce hazards hoping to improve our
Unicode hygiene.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Chained comparisons from Python and CoffeeScript

2013-07-22 Thread Bjoern Hoehrmann
* Andy Earnshaw wrote:
typeof null == null is a different case though.  typeof is a requirement
for checking the existence of pre-declared variables, so you could expect
something like, if (typeof someVar === object  someVar === null), to
appear at least in a few places on the web.  Tab's saying that this
proposal wouldn't break much (if anything) because code isn't written like
this anywhere: it wouldn't be readable or reliable.  Writing a  b  c in
ES=5 would be either stupidity or ignorance (in the case of the latter
then this proposal would probably fix more code than it breaks).

If that code is interpreted reliably and the behavior is desired, there
is no reason to assume that e.g. code obfuscators would not produce it.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Chained comparisons from Python and CoffeeScript

2013-07-22 Thread Bjoern Hoehrmann
* Brendan Eich wrote:
We could also introduce binary =, AKA cmp, return -1, 0, or 1. 
Imagine the sort fun:

   a.sort((a, b) = a = b)

:-P. The win over using

   a.sort((a, b) = a - b)

is that = would work as expected for string-typed a and b as well.

In Perl = compares as if the operands had been converted to numbers
and the cmp operator as if the operands had been converted to strings.
If you are suggesting to have `=` change behavior based on type in-
spection at runtime, that behavior might lead to subtle bugs when the
array contains strings and numbers (e.g., you might expect swapping
the operands is the same as reversing the result, but it might not be.
Indeed, without specifying the exact sorting algorithm you might get
different sort orders in different implementations due to differences
in which values are compared in which order). Having two operators a-
voids that.

The other day I published http://search.cpan.org/dist/List-OrderBy/
which takes a lesson from .NET and offers better syntax for multi-key
sorts, `my @sorted = order_by { ... } then_by { ... } @unsorted;`. I
ended up with a not-so-nice list of variants to support ascending and
descending order and numeric and string comparisons. Thinking about
that now makes me wonder if the Voyager holodeck has and requires an
Arch console for advanced uses. In Haskell you would never write code
like `a.sort((a, b) = a = b)` as, to paraphrase, `a.sort(=)` is
much more clear.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Future cancellation

2013-05-01 Thread Bjoern Hoehrmann
* Jonas Sicking wrote:
Then there's of course the issue of what we should do with APIs that
combine several Futures into a single one. Like Future.every() etc.

Similarly, there's also the issue of what to do with chaining.

I'm tempted to say that if you create combined or dependent Futures,
you still only have the ability to cancel them through the original
CancelableFuture.

And the progress of multiple Futures can only be observed through
the individual ProgressFuture objects? I would expect the opposite.
Similarily, I would expect to be able to mix ProgressFuture objects
with other Future objects, and still be able to observe progress
of the combination. And if I can do that, I would also expect that I
can turn a single Future into a ProgressFuture in this sense, but
then the whole subclassing idea kinda breaks down, why bother with
that. And cancelation does not seem quite so different from pro-
gress in this sense.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Future cancellation

2013-04-30 Thread Bjoern Hoehrmann
* Jonas Sicking wrote:
I do not think that we should add cancellation on the base Future
interface. I.e. we shouldn't make *all* Futures cancellable.

Cancelability should only be possible when the implementation of the
Future would actually stop doing work if the Future is cancelled. I.e.
cancelling a Future shouldn't simply prevent the result callbacks from
being called, but it should prevent whatever work is needed to
calculate the result from happening.

However it would be very complex and expensive if we had to make all
APIs that want to use Futures also support being cancelled.

You seem to be arguing based on the word cancel. The semantic would
rather be Please note that I am no longer interested in this value,
which would make sense even if the work is not actually stopped,
and, accordingly, it would not be complex and expensive to support.

In other words, I think most people would agree that a forced stop
option for everything is complex and expensive and should not be re-
quired, but that does not necessarily invalidate the ideas that moti-
vate a cancel option.

The solution is to create a subclass of Future which allows the
back-end work to be cancelled. I.e. a CancelableFuture, or
AbortableFuture. This subclass would have a .cancel() or .abort()
method on it. The FutureResolver created when the CancelableFuture is
created would have a callback which is called when .cancel()/.abort()
is called.

Future subclasses seem rather dubious to me.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: ES6,ES7,ES8 and beyond. A Proposed Roadmap.

2013-04-22 Thread Bjoern Hoehrmann
* Sam Tobin-Hochstadt wrote:
What exactly would be the semantic difference between this and just using
'yield'?

If you consider it from the perspective of someone reading the code, you
might find, as an example, `try { ... yield ... }` rather weird (how can
yielding control fail?) while `try { ... await ... }` is fairly obvious.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: ES6,ES7,ES8 and beyond. A Proposed Roadmap.

2013-04-21 Thread Bjoern Hoehrmann
* Andrea Giammarchi wrote:
not sure I understand those examples, but the moment a developer starts
yelding everything, is the moment all non-blovking asynchronous advantages
are gone 'cause you are waiting instead of keep doing the rest, isn't it?

A simple example would be an application running in a web browser. You
may want to draw three images on a canvas, but the images have to be
downloaded first, and while they are downloading, you still want to re-
spond to user input. With only a single thread of execution, you have
to yield control to the browser while awaiting the images to do so. In
other words, you may not be able to keep doing the rest, but rather
want to do all the other things meanwhile.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: ES6,ES7,ES8 and beyond. A Proposed Roadmap.

2013-04-21 Thread Bjoern Hoehrmann
* Sam Tobin-Hochstadt wrote:
I don't see what the point of `await` is in your gist.  It looks like
all of the work is being done by `function^`, which looks to be sugar
for creating a function and passing it to a scheduler like `Q.async`
or `taskjs.spawn`.  We could add that sugar if we wanted, and not need
to add `await`.

The only language construct that allows yielding control until resumed
later so far is `yield`, but using `yield` to `wait` is rather awkward.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: ES6,ES7,ES8 and beyond. A Proposed Roadmap.

2013-04-21 Thread Bjoern Hoehrmann
* Sam Tobin-Hochstadt wrote:
On Sun, Apr 21, 2013 at 9:20 AM, Bjoern Hoehrmann derhoe...@gmx.net wrote:
 * Sam Tobin-Hochstadt wrote:
I don't see what the point of `await` is in your gist.  It looks like
all of the work is being done by `function^`, which looks to be sugar
for creating a function and passing it to a scheduler like `Q.async`
or `taskjs.spawn`.  We could add that sugar if we wanted, and not need
to add `await`.

 The only language construct that allows yielding control until resumed
 later so far is `yield`, but using `yield` to `wait` is rather awkward.

First, using `yield` is precisely correct in the cooperative
concurrency module of JavaScript. Second, are you genuinely suggesting
that we should add a new keyword with the same semantics just because
the word choice might be awkward in English?

A function yielding a value to the caller with the option for the caller
to resume the function later is very different from a function asking to
be resumed once something has happened or has become available. I would
certainly want to express the distinction in the clearest way possible,
and there is nothing unusual about that, there are many other languages
that allow to express this distinction. I would not define the two names
so that you can always replace one by the other, though.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: First crack at a Streams proposal

2013-04-20 Thread Bjoern Hoehrmann
* Tab Atkins Jr. wrote:
On Sat, Apr 20, 2013 at 9:19 AM, Isaac Schlueter i...@izs.me wrote:
 I'm not seeing what in this proposal can't be implemented in
 JavaScript as it is today.  Is there an implementation of this
 somewhere?  Are there any programs that use these streams?

This is a fully-general counter-argument against literally everything
that doesn't require new primitives, and so is useless as an actual
argument.

It is unlikely that he meant to make a useless argument. I would take it
as a request for references to running code so we can get a better idea
of how this is implemented and used in practise to aid the discussion. I
believe some references have already been given in the various threads,
perhaps it would be useful to collect some of them on your website?
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Observability of NaN distinctions — is this a concern?

2013-03-25 Thread Bjoern Hoehrmann
* Kenneth Russell wrote:
No. The typed array views (everything except DataView) have used the
host machine's endianness from day one by design -- although the typed
array spec does not state this explicitly. If desired, text can be
added to the specification to this effect.

That seems to be called for.

Thanks,
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Throwing StopIteration in array extras to stop the iteration

2013-03-04 Thread Bjoern Hoehrmann
* Jeff Walden wrote:
On 03/03/2013 06:49 PM, Rick Waldron wrote:
 Is this +1 to findIndex? 

Not that I much care between the two, just making sure another 
reasonable name is considered, but I'm not sure why it wouldn't be named 
find rather than findIndex.  The index seems like the only bit you'd 
reasonably be looking to find.  (Well, maybe existence, but I'd expect a 
name like contains for that, or just indexOf !== -1.)

`find` finds the first element matching the supplied predicate in C#,
Groovy, Haskell, and Scala, and the first element that is equal to the
supplied value in C++, to mention a few examples. I think Python has a
`find` method on strings that returns the index, but that's not generic.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Throwing StopIteration in array extras to stop the iteration

2013-03-03 Thread Bjoern Hoehrmann
* David Bruant wrote:
One (minor) annoyance with forEach/map, etc. is that the enumeration 
can't be stopped until all elements have been traversed which doesn't 
suit every use case. One hack to stop the enumeration is to throw an 
error but that requires to wrap the .forEach call in a try/catch block 
which is annoying too for code readability.

The iterator protocol defines the StopIteration value. What about 
reusing this value in the context of array extras?

Using exceptions for normal flow control seems like a bad idea to me.

 (function(){
 [2, 8, 7].forEach(function(e){
 if(e === 8)
 throw StopIteration;
 console.log(e)
 })

 console.log('yo')
 })();

Languages like Haskell and C# would use `takeWhile` for this purpose,
so you would have something like

  [2, 8, 7].takeWhile(x = x !== 8).forEach(x = console.log(e));

That seems much better to me.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Throwing StopIteration in array extras to stop the iteration

2013-03-03 Thread Bjoern Hoehrmann
* David Bruant wrote:
I've found myself multiple times in a situation where I needed the index 
of the first element responding to some conditions. I solved it the 
following way:

 var index;
 array.some(function(e, i){
 if(someCondition(e)){
 index = i;
 return false;
 }

 return true;
 })

It's usually a bad idea to trigger side-effects from such callbacks. The
proper solution here would be having a suitable method available, like

  var index = array.findIndex(someCondition);

as it is called in Haskell and C#.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss