The next version of the Unicode Standard will be Version 3.2.0, due for release in March, 2002. The beta period for this version will be until January XXX, 2001.
During this beta period, updated Unicode Character Database files are available for public comment. We strongly encourage implementers to download these files and test them with their programs, well before the end of the beta period. These are located in http://www.unicode.org/Public/BETA/ This version adds 1016 new characters, new properties, additional conformance clauses, and textual clarifications. ==================== New Characters ==================== The primary feature of Unicode 3.2 is the addition of 1016 new encoded characters. These additions consist of several Philippine scripts (Tagalog, Hanunoo, Buhid, Tagbanwa), a large collection of mathematical symbols, and small sets of other letters and symbols. Architectural additions include: Variation Selectors The variation selector selects a different appearance of an already encoded character. It is not intended as a general code extension mechanism. Only the sequences specifically defined in the Unicode Standard are sanctioned for standard use; all other sequences are undefined. No sequences containing combining characters or composite characters will be defined. The tables of standardized variants are listed in the Unicode Character Database in the file StandardizedVariants.html Combining Grapheme Joiner (U+034F) This new character is used to request that the two adjacent characters are not to be in separate grapheme clusters. (Note: the term "grapheme" has been replaced by "grapheme cluster" in the Unicode Standard.) Word Joiner (U+2060) A new character has been added to take the place of the non-BOM usage of FEFF. The latter usage of FEFF will be deprecated, leaving only the usage as a BOM. ==================== New Properties ==================== The following new property files have been added: - PropertyValueAliases and PropertyAliases These contain recommended UCD property names and property value names. These names can be used for XML formats of UCD data, for regular-expression property tests, and other programmatic textual descriptions of Unicode data. - DerivedAge This file shows when various code points were designated in successive versions of the Unicode standard. Other new properties include: - Grapheme_Base, Grapheme_Extend, Grapheme_Link For programmatic determination of grapheme cluster boundaries. - IDS_Binary_Operator, IDS_Trinary_Operator, Radical, Unified_Ideograph For programmatic determination of Ideographic Description Sequences. - Default_Ignorable_Code_Point For programmatic determination of default-ignorable code points. New characters that should be ignored in processing (unless explicitly supported) will be assigned in these ranges, permitting programs to correctly handle future assignments of such characters. - Deprecated For programmatic determination of deprecated characters. These characters will not be removed from the standard, but their usage is strongly discouraged. Note: For consistency with the property naming conventions, in the data files the property BidiMirrored has been changed to Bidi_Mirrored, and the long form of Comp_Ex is used. ==================== Conformance ==================== Most notable is a further tightening of the definition of UTF-8, to eliminate irregular UTF-8. ==================== Known Issues ==================== Some of the data will be corrected over the course of the beta. In particular, the following will need further work: - The values for Bidi Mirrored and Bidi Mirroring need to be completed. - U+23B4..U+23B6 need changes to General Category and Line Break.

