Re: Unicode normalization problem

2015-04-02 Thread Mathias Bynens
On Thu, Apr 2, 2015 at 1:39 AM, Andrea Giammarchi
andrea.giammar...@gmail.com wrote:
 Jordan the purpose of `Array.from` is to iterate over the string, and the 
 point of iteration instead of splitting is to have automagically codepoints. 
 This, unless I've misunderstood Mathias presentation (might be)

 So, here there is a different problem: there are code-points that do not 
 represent real visual representation ...

Those are called grapheme clusters or just “graphemes”, as Boris
mentioned. And here’s how to deal with them:
https://mathiasbynens.be/notes/javascript-unicode#other-grapheme-clusters

“Unicode Standard Annex #29 describes [an algorithm for determining
grapheme cluster
boundaries](http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries).
For a _completely_ accurate solution that works for all Unicode
scripts, implement this algorithm in JavaScript, and then count each
grapheme cluster as a single symbol.”

 or maybe, the real problem, is about broken `Array.from` polyfill?

`Array.from` just uses `String.prototype[Symbol.iterator]` internally,
and that is defined to deal with code points, not grapheme clusters.
Either choice would have confused some developers. IIRC, Perl 6 has
built-in capabilities to deal with grapheme clusters, but until ES
does, this use case must be addressed in user-land.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Unicode normalization problem

2015-04-02 Thread Brendan Eich
It was the 90s, when 16 bits seemed enough. Wish we could go back. Even 
in 1995 this was obviously going to fail, but the die had been cast 
years earlier in Windows and Java APIs and language/implementation designs.


/be

Claude Pache wrote:

(So, taking your example, the  character is internally represented as a 
sequence of two 16-bit-units, not “characters”. And, very confusingly, the 
String methods that contain “char” in their name have nothing to do with 
“characters”.)

—Claude

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Unicode normalization problem

2015-04-02 Thread Claude Pache

 Le 2 avr. 2015 à 01:22, Jordan Harband ljh...@gmail.com a écrit :
 
 Unfortunately we don't have a String#codepoints or something that would 
 return the number of code points as opposed to the number of characters (that 
 length returns) - something like that imo would greatly simplify explaining 
 the differences to people.
 
 For the time being, I've been explaining that some characters are actually 
 made up of two, and the  character (it's a fun example to use) is an example 
 of two characters combining to make one code point. It's not a quick or 
 trivial thing to explain but people do seem to grasp it eventually.

And when they think to have understood, they are in fact still in great 
trouble, because they will confuse it with other unrelated issues like grapheme 
clusters and/or precomposed characters.

The issue here is specific to the UTF16 encoding, where some Unicode code 
points are encoded as a sequence of two 16-bit units; and ES strings are (by an 
accident of history) sequences of 16-bit units, not Unicode code points. I 
think it is important to stress that it is an issue of encoding, at least in 
order to have a chance to distinguish it from the other aforementioned issues.

(So, taking your example, the  character is internally represented as a 
sequence of two 16-bit-units, not “characters”. And, very confusingly, the 
String methods that contain “char” in their name have nothing to do with 
“characters”.)

—Claude

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Re-exporting imports and CreateImportBinding assertions

2015-04-02 Thread Adam Klein
I've added this to a few bugs on the bug-tracker:

https://bugs.ecmascript.org/show_bug.cgi?id=4184 (CreateImportBinding)
https://bugs.ecmascript.org/show_bug.cgi?id=4244 (GetExportedNames and
ResolveExport)

On Wed, Apr 1, 2015 at 4:31 PM, Adam Klein ad...@chromium.org wrote:

 I have a question about CreateImportBinding(N, M, N2) (where N is the name
 to create in the importing module, and M is a module which exports N2).

 Step 4 of
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-createimportbinding
 is the following assertion

 Assert: When M.[[Environment]] is instantiated it will have a direct
 binding for N2.

 What about the case were M is simply re-exporting an import? Consider:

 -
 module 'a':

 import { x } from 'b';

 -
 module 'b':

 import { x } from 'c';
 export { x };

 -
 module 'c':

 export let x = 42;

 -

 In this case, when running CreateImportBinding(x, 'b', x) in module 'a',
 the assertion fails, as x in 'b' is an immutable indirect binding (itself
 created by CreateImportBinding).

 Is there a need for this assert I'm missing? I don't think skipping over
 this assert, or removing direct from its wording,  will cause any
 problems. Also, the term direct binding is not defined anywhere that I
 can find, except as the negation of the indirect binding created by
 CreateImportBinding.

 Note that there's a similar issue in ResolveExport: step 4.a.i of
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-resolveexport
 asserts that resolved exports found in [[LocalExportEntries]] are leaf
 bindings (another term that goes undefined), where by the usual CS
 definition of leaf the assertion would be false for x in 'b' (when
 resolved from 'a').

 - Adam

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss