Unicode 3.2 BETA

Mark Davis Tue, 11 Dec 2001 19:31:07 -0800

The next version of the Unicode Standard will be Version 3.2.0, due for
release in March, 2002. The beta period for this version will be until
January XXX, 2001.


During this beta period, updated Unicode Character Database files
are available for public comment. We strongly encourage implementers to
download these files and test them with their programs, well before the end
of the beta period. These are located in http://www.unicode.org/Public/BETA/

This version adds 1016 new characters, new properties, additional
conformance clauses, and textual clarifications.

====================
New Characters
====================

The primary feature of Unicode 3.2 is the addition of 1016 new encoded
characters. These additions consist of several Philippine scripts (Tagalog,
Hanunoo, Buhid, Tagbanwa), a large collection of mathematical symbols, and
small sets of other letters and symbols.

Architectural additions include:

Variation Selectors
The variation selector selects a different appearance of an already encoded
character. It is not intended as a general code extension mechanism. Only
the sequences specifically defined in the Unicode Standard are sanctioned
for standard use; all other sequences are undefined. No sequences containing
combining characters or composite characters will be defined. The tables of
standardized variants are listed in the Unicode Character Database in the
file StandardizedVariants.html

Combining Grapheme Joiner (U+034F)
This new character is used to request that the two adjacent characters are
not to be in separate grapheme clusters. (Note: the term "grapheme" has been
replaced by "grapheme cluster" in the Unicode Standard.)

Word Joiner (U+2060)
A new character has been added to take the place of the non-BOM usage of
FEFF. The latter usage of FEFF will be deprecated, leaving only the usage as
a BOM.

====================
New Properties
====================

The following new property files have been added:

- PropertyValueAliases and PropertyAliases
These contain recommended UCD property names and property value names. These
names can be used for XML formats of UCD data, for regular-expression
 property tests, and other programmatic textual descriptions of Unicode
data.

- DerivedAge
This file shows when various code points were designated in successive
versions of the Unicode standard.

Other new properties include:

- Grapheme_Base, Grapheme_Extend, Grapheme_Link
For programmatic determination of grapheme cluster boundaries.

- IDS_Binary_Operator, IDS_Trinary_Operator, Radical, Unified_Ideograph
For programmatic determination of Ideographic Description Sequences.

- Default_Ignorable_Code_Point
For programmatic determination of default-ignorable code points. New
characters that should be ignored in processing (unless explicitly
supported) will be assigned in these ranges, permitting programs to
correctly handle future assignments of such characters.

- Deprecated
For programmatic determination of deprecated characters. These characters
will not be removed from the standard, but their usage is strongly
discouraged.

Note: For consistency with the property naming conventions, in the data
files the property BidiMirrored has been changed to Bidi_Mirrored, and the
long form of Comp_Ex is used.

====================
Conformance
====================

Most notable is a further tightening of the definition of UTF-8, to
eliminate irregular UTF-8.

====================
Known Issues
====================

Some of the data will be corrected over the course of the beta. In
particular, the following will need further work:

- The values for Bidi Mirrored and Bidi Mirroring need to be completed.
- U+23B4..U+23B6 need changes to General Category and Line Break.

Unicode 3.2 BETA

Reply via email to