Re: "Unicode in 'NFG' formation" ?

Helmut Wollmersdorfer Wed, 20 May 2009 00:06:20 -0700

Larry Wall wrote:

On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote:

2) Can I use Unicode property matching safely with graphemes?
   If yes, who or what maintains the necessary tables?

Good question.  My assumption is that adding marks to a character
doesn't change its fundamental nature.  What needs to be provided
other pass-through to the base character's properties?

This will work in most cases, but e.g. not with the propertyASCII_Hex_Digit.


LATIN SMALL LETTER A is ASCII_Hex_Digit
but

GRAPHEME LATIN SMALL LETTER A WITH DOT BELOW AND DOT ABOVE is_notASCII_Hex_Digit

I will try to generate some millions of cases based on nfc(nfd($string))to find out the best inheritance rules.

4) Should the definition of graphemes conform to Unicode Standard Annex#29 'grapheme clusters'? Wich level - legacy, extended or tailored?

No opinion, other than that we're aiming for the most modern
formulation that doesn't implicitly cede declarational control to
something out of the control of Perl 6 declarations.  (See locales for
an example of something Perl 6 ignores in the absence of an explicit
declaration to pay attention to them.)  So just guessing from the
names without reading the Annex in question, not legacy, but probably
extended, with explicitly tailoring allowed by declaration.  (Unless
extended has some dire performance or policy consequences that would
be contraindicative...)


Will look into ICU what's supported.

So as long as we stay inside these fundamental Perl 6 design
principles, feel free to whack on the specs.


OK. Hopefully some Indic, Arabic and Asian natives review this.

Helmut Wollmersdorfer

Re: "Unicode in 'NFG' formation" ?

Reply via email to