| ï
There's still a problem between these "clarified"
definitions, introduced by D14:
"a combining character is a graphic character" means it must be
a graphic character, and this excludes character category "Cf".
"Combining characters consist of all characters with the
General Category values of Spacing Combining Mark (Mc), Non-Spacing Mark (Mn),
and Enclosing Mark (Me)."
Thanks, now all graphic characters are now in a complete and
exclusive partition as either base characters or combining characters. And the
whole Unicode code points set is then mapped in a partition between:
base characters, combining characters, and non-graphic characters (these
includes non-characters).
As a combining character sequence is made only of a
optional base character and combining character, it must then only include
graphic characters, and so all non-graphic characters ("gc=C*", except
"gc=Co" without private agreement) are excluded from all occurences of
combining character sequences.
ZWJ/ZWNJ are then excluded from any combining character
sequence. But not CGJ as it is a combining character (Mn) and thus a graphic
character.
The problem:
- CGJ, because it is now "clearly" a graphic character, should
not be excluded from having both a graphic behavior (needed for Hebrew), and a
semantic (so it impacts collation or text transformations like case mappings
or other foldings), even if it is invisible and has no associated glyph (in
D13a point 3: "Not all graphic characters have visibly rendered glyphs.
Particular examples include spaces and some combining marks.")...
- ZWJ and ZWNJ (Cf) are not graphic characters, but the
way they are used in Khmer, do not obey these definitions as they
participate to combining character sequences...
----- Original Message -----
Sent: Sunday, November 09, 2003 12:52
AM
Subject: Re: ZWJ, ZWNJ, CGJ and
combination
The UTC just approved a
clarification of the base character definition, as follows:
D13a Graphic character: a character with the General Categories of
Letter (L), Combining Mark (M), Number (N), Punctuation (P), Symbol (S), or
Space Separator (Zs).
- Graphic characters specifically exclude the line and paragraph
separators (Zl, Zp) and exclude the characters with the General Categories
of Other (Cn, Cs, Cc, Cf).
- For more information, see Chapter 2, especially Section 2.4 Code Points
and Characters and Table 2-2 Types of Code Points.
- Not all graphic characters have visibly rendered glyphs. Particular
examples include spaces and some combining marks.
- The interpretation of private use characters (Co) as graphic characters
or not is determined by private agreement. However, in the absence of
private agreement, private use characters should be interpreted as graphic
characters.
D13b Base character: any graphic character except for those with the
General Category of Combining Mark (M).
- Most Unicode characters are base characters. A base character is any
code point that has one of the General Categories of Letter (L), Number (N),
Punctuation (P), Symbol (S), or Space Separator (Zs).
- Base characters are independent graphic characters, but this does not
preclude the presentation of base characters from adopting different
contextual forms or participating in ligatures.
- The interpretation of private use characters (Co) as base characters or
not is determined by private agreement. However, in the absence of private
agreement, private use characters should be interpreted as base characters.
D14 Combining character: a graphic character with the General
Category of Combining Mark (M).
- The graphic positioning of a combining character depends on the last
preceding base character. The combining character is said to apply to
that base character.
- Combining characters consist of all characters with the General Category
values of Spacing Combining Mark (Mc), Non-Spacing Mark (Mn), and Enclosing
Mark (Me).
- All characters with non-zero canonical combining class (Cc) are
combining characters, but the reverse is not the case: there are combining
characters with a zero canonical combining class.
- The interpretation of Private Use characters (Co) as combining
characters or not is determined by private agreement.
Mark __________________________________ http://www.macchiato.com â
ààààààààààààààààààààà â ----- Original Message ----- From:
"Peter Kirk" <[EMAIL PROTECTED]> To: "Unicode List"
<[EMAIL PROTECTED]> Sent: Sat, 2003 Nov 08 11:58 Subject: ZWJ,
ZWNJ, CGJ and combination
> Are the characters ZWJ, ZWNJ and CGJ
base characters, combining > characters, neither, or even both? Which
specific character properties > should I look at to decide
this? > > Are these characters legal within combining character
sequences? Can ZWJ > and ZWNJ be used to control ligation of combining
characters? If not, is > there an alternative mechanism for
this? > > > -- > Peter Kirk >
[EMAIL PROTECTED] (personal) > [EMAIL PROTECTED] (work) >
http://www.qaya.org/ > > > >
|