Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Mathias Bynens
Norbert echoes my thoughts perfectly: Using a Unicode escape for non-textual data seems like abuse to me - Unicode is a character encoding standard. For Unicode, anything beyond six hex digits is excessive. Allen, what use cases for using Unicode escapes / strings for non-textual data did you

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Tab Atkins Jr.
On Tue, Jan 24, 2012 at 5:14 PM, Allen Wirfs-Brock al...@wirfs-brock.com wrote: The current 16-bit character strings are sometimes uses to store non-Unicode binary data and can be used with non-Unicode character encoding with up to 16-bit chars.  21 bits is sufficient for Unicode but perhaps is

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Allen Wirfs-Brock
On Jan 24, 2012, at 11:45 PM, Norbert Lindenberg wrote: I don't see the standard allowing character encodings other than UTF-16 in strings. Section 8.4 says When a String contains actual textual data, each element is considered to be a single UTF-16 code unit. This aligns with other

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread John Tamplin
On Wed, Jan 25, 2012 at 12:46 PM, Allen Wirfs-Brock al...@wirfs-brock.comwrote: Arbitrary 16-bit values can be placed in a String using either String.fromCharCode (15.5.3.2) or the \u notation in string literals. Neither of these enforce a requirement that individual String elements are

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Gillam, Richard
The current 16-bit character strings are sometimes uses to store non-Unicode binary data and can be used with non-Unicode character encoding with up to 16-bit chars. 21 bits is sufficient for Unicode but perhaps is not enough for other useful encodings. 32-bit seems like a plausable unit.

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Mark Davis ☕
You can't use \u10 as syntax, because that could be \u10FF followed by literal FF. A better syntax is \u{...}, with 1 to 6 digits, values from 0 .. 10. Mark *— Il meglio è l’inimico del bene —* * * * [https://plus.google.com/114199149796022210033] * On Wed, Jan 25, 2012 at 10:59,

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Mark Davis ☕
(oh, and I agree with your other points) Mark *— Il meglio è l’inimico del bene —* * * * [https://plus.google.com/114199149796022210033] * On Wed, Jan 25, 2012 at 11:11, Mark Davis ☕ m...@macchiato.com wrote: You can't use \u10 as syntax, because that could be \u10FF followed by literal

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Gillam, Richard
Mark-- Of course. Sorry. That should have been \U10 is equivalent to \udbff\udfff, with a capital U, or \u{10} is equivalent to \udbff\udfff. --Rich On Jan 25, 2012, at 11:11 AM, Mark Davis ☕ wrote: You can't use \u10 as syntax, because that could be \u10FF followed by literal

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Allen Wirfs-Brock
On Jan 25, 2012, at 9:54 AM, John Tamplin wrote: On Wed, Jan 25, 2012 at 12:46 PM, Allen Wirfs-Brock al...@wirfs-brock.com wrote: Arbitrary 16-bit values can be placed in a String using either String.fromCharCode (15.5.3.2) or the \u notation in string literals. Neither of these

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread John Tamplin
On Wed, Jan 25, 2012 at 2:33 PM, Allen Wirfs-Brock al...@wirfs-brock.comwrote: It isn't clear from your source code what encoding issues you have actually identified. I suspect that you are talking about what happens when an external resource (a application/javascript file) which may be in

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Allen Wirfs-Brock
On Jan 25, 2012, at 11:37 AM, John Tamplin wrote: On Wed, Jan 25, 2012 at 2:33 PM, Allen Wirfs-Brock al...@wirfs-brock.com wrote: It isn't clear from your source code what encoding issues you have actually identified. I suspect that you are talking about what happens when an external

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Allen Wirfs-Brock
On Jan 25, 2012, at 10:59 AM, Gillam, Richard wrote: The current 16-bit character strings are sometimes uses to store non-Unicode binary data and can be used with non-Unicode character encoding with up to 16-bit chars. 21 bits is sufficient for Unicode but perhaps is not enough for other

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread John Tamplin
On Wed, Jan 25, 2012 at 2:55 PM, Allen Wirfs-Brock al...@wirfs-brock.comwrote: The primary intent of the proposal was to extend ES Strings to support a uniform represent of all Unicode characters, including non-BMP. That means that any Unicode character should occupy exactly one element

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Allen Wirfs-Brock
On Jan 25, 2012, at 12:25 PM, John Tamplin wrote: On Wed, Jan 25, 2012 at 2:55 PM, Allen Wirfs-Brock al...@wirfs-brock.com wrote: The primary intent of the proposal was to extend ES Strings to support a uniform represent of all Unicode characters, including non-BMP. That means that any

Re: Question about the “full Unicode in strings” strawman

2012-01-24 Thread Mark S. Miller
On Tue, Jan 24, 2012 at 12:33 PM, Allen Wirfs-Brock al...@wirfs-brock.comwrote: Note that this proposal isn't currently under consideration for inclusion in ES.next, but the answer to you question is below [...] Just as the current definition of string specifies that a String is a sequence

Re: Question about the “full Unicode in strings” strawman

2012-01-24 Thread Allen Wirfs-Brock
On Jan 24, 2012, at 2:11 PM, Mark S. Miller wrote: On Tue, Jan 24, 2012 at 12:33 PM, Allen Wirfs-Brock al...@wirfs-brock.com wrote: Note that this proposal isn't currently under consideration for inclusion in ES.next, but the answer to you question is below [...] Just as the current

Re: Question about the “full Unicode in strings” strawman

2012-01-24 Thread Norbert Lindenberg
I don't see the standard allowing character encodings other than UTF-16 in strings. Section 8.4 says When a String contains actual textual data, each element is considered to be a single UTF-16 code unit. This aligns with other normative references to UTF-16 in sections 2, 6, and 15.1.3.

Question about the “full Unicode in strings” strawman

2012-01-22 Thread Mathias Bynens
http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_strings#unicode_escape_sequences states: To address this issue, a new form ofUnicodeEscapeSequence is added that is explicitly tagged as containing var variable number (up to 8) of hex digits. The new definition is: