On Sat, Dec 22, 2012 at 9:52 AM, Joshua Bell <[email protected]> wrote: > You should take a look at > http://wiki.ecmascript.org/doku.php?id=harmony:unicode_supplementary_characters > if you haven't, and look at the es-discuss archives > https://mail.mozilla.org/listinfo/es-discuss for various discussions of > improving Unicode handling in ES6.
I'm glad there's discussion on the subject, at least! Of course, the compatibility problems are very much there. http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html#UTF32 > The short version is that the next version of ECMAScript is gaining some > capabilities to handle non-BMP code points more sensibly, but these will be > rather limited and provide close to the bare minimum necessary for > processing strings with "astral" data. > > I realize that's somewhat orthogonal to your point which is about v8 > internals, but ECMAScript itself is still firmly mired in the world of > 16-bit code units. FWIW, Web APIs are also sticking with DOMStrings > comprised of 16-bit code units. The main problem is backward compatibility. I'll see if I can join the ES discussion (as if I don't already have more mailing lists than I can keep up with!), but this is also an implementation issue. The flexible string representation depends on strings being immutable, as they are in both Python and Pike, and ECMAScript fits that too. It'd be very efficient with handling the common case where a UTF-8 string contains no bytes >0x7F, as the original string buffer can be used to represent the string itself (assuming that it's owned by the right subsystem, etc). I'd like to see this as an openly backward-incompatible change. It's the easiest way forward - acknowledge that the previous behaviour is buggy, and make it possible to run a script in non-buggy mode. ChrisA -- v8-users mailing list [email protected] http://groups.google.com/group/v8-users
