From: Arcane Jill > I would be very surprised if it did, since Java chars are still only > sixteen bits wide,
Yes but they include surrogates as valid values for char, so UTF-16 could be used to represent variable names. The main problem is not with variables, member variables and methods, but with class and package names which need to be mappable into filenames to be stored in a local filesystem (at compile time) or in a zip directory entry (for packaged applications). Not all characters are usable as valid filenames, due to the way filesystems may rearrange or normalize or transcode these names. > and the new math alphanumerics are not in BMP. Still, I'd be very happy to be proved wrong on this one. For now the JLS (http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html) defines the language lexical translation as supporting only Unicode 2.1 (every thing else must use "Unicode escapes", notably for characters out of the BMP which need to be represented as "\uD8xx\uDCxx" and can then only be used within String constants.) In section 3.8 -- Indentifiers --, you'll find this: [quote] An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. An identifier cannot have the same spelling (Unicode character sequence) as a keyword (�3.9), boolean literal (�3.10.3), or the null literal (�3.10.7). Letters and digits may be drawn from the entire Unicode character set, which supports most writing scripts in use in the world today, including the large sets for Chinese, Japanese, and Korean. This allows programmers to use identifiers in their programs that are written in their native languages. A "Java letter" is a character for which the method Character.isJavaIdentifierStart returns true. A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart returns true. [/quote] As the valid characters usable in identifiers need to be mappable into a Character instance (which only supports UCS2 code points) so that Character.isJavaIdentifier() can return true, including characters out of BMP in identifiers would require that surrogates are included in the list of possible Character instances whose isJavaIndetifier() test returns true. So let's see the Character class documentation: [quote] isJavaIdentifierPart public static boolean isJavaIdentifierPart(char ch) Determines if the specified character may be part of a Java identifier as other than the first character. A character may be part of a Java identifier if any of the following are true: * it is a letter (matches the general categories UPPERCASE_LETTER, LOWERCASE_LETTER, TITLECASE_LETTER, MODIFIER_LETTER, OTHER_LETTER) * it is a currency symbol (such as '$') * it is a connecting punctuation character (such as '_') * it is a digit * it is a numeric letter (such as a Roman numeral character) * it is a combining mark * it is a non-spacing mark * isIdentifierIgnorable returns true for the character Parameters: ch - the character to be tested. Returns: true if the character may be part of a Java identifier; false otherwise. Since: 1.1 See Also: isIdentifierIgnorable(char), isJavaIdentifierStart(char), isLetterOrDigit(char), isUnicodeIdentifierPart(char) [/quote] The requirements above makes surrogates unsuitable for identifiers, simply because surrogates have no suitable general category that matches the above requirements: * they are neither letters, currency symbols, and so on... because surrogates have NO general category; * the Character class just list them with a SURROGATE general category, see: int Character.getType(Character ch); * but it returns false for isDefined() as they don't have an entry in the UCD or a value in a range defined in the UCD; I doubt that this can be changed.

