On Tue, Apr 28, 2009 at 2:27 PM, Patrick R. Michaud <pmich...@pobox.com> wrote:
>> According to the 5.0.0 standard, section 4.8:
>>
>> "Unicode character names contain only uppercase Latin letters A
>> through Z, digits, space, and hyphen-minus."
>>
>> So it seems the notes in parentheses are not considered part of the char 
>> name.
>
> Countering this, though:
>
> * The XML schema for the "Unicode Character Database in XML" [1]
>  seems to allow parens in the character name property:
>
>    character-name = xsd:string { pattern="([A-Z0-9 #\-\(\)]*)|(<control>)" }

Also '#', though I see no character names containing that symbol.

But all the parentheses I see in the list of character names are
surrounding lowercase letters, which are explicitly disallowed not
only in the spec I quoted, but in the XML scheme definition you quote
above.  e.g.

00C6 LATIN CAPITAL LETTER AE (ash)

> * The Unicode character name database [2] has parens in the
>  name property field for many characters
>
>    000A;<control>;Cc;0;B;;;;;N;LINE FEED (LF);;;;

That's not the name property field.  The Unicode character name is
field 1 ("<control>", in this case).  The field whose value is "LINE
FEED (LF)" is the Unicode_1_Name field, wihch for control characters
supplies the ISO 6429 name.

-- 
Mark J. Reed <markjr...@gmail.com>

Reply via email to