Larry Wall larry-at-wall.org |Perl 6| wrote:
On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote:
[1] Open questions:

1) Will graphemes have an unique charname?
   e.g. GRAPHEME LATIN SMALL LETTER A WITH DOT BELOW AND DOT ABOVE

Yes, presumably that comes with the "normalization" part of NFG.
We're not aiming for round-tripping of synthetic codepoints, just
as NFC doesn't do round-tripping of sequences that have precomposed
codepoints.  We're really just extending the NFC notion a bit further
to encompass temporary precomposed codepoints.

Unique for asking for the name, not when specifying the name. Just as with the code-point order, any combination that means the same should give the same grapheme, just as if you had create the code point sequence first. Perhaps you are not realizing that the different classes of modifiers are independent. You could say DOT ABOVE AND DOT ABOVE and get the same thing as DOT BELOW and DOT ABOVE.



2) Can I use Unicode property matching safely with graphemes?
   If yes, who or what maintains the necessary tables?

Good question.  My assumption is that adding marks to a character
doesn't change its fundamental nature.  What needs to be provided
other pass-through to the base character's properties?

Depends on the property! Being a modifier, for example. A detailed look would be needed to decide which properties just pass through to the base char, which are enhanced (e.g. "letter" becomes "letter with modifiers"), which don't make sense, which are mostly OK but change sometimes, etc.


Reply via email to