Spec

Mark J. Reed Fri, 30 Jan 2009 04:17:07 -0800

On Fri, Jan 30, 2009 at 6:30 AM, Darren Duncan <[email protected]> wrote:
> [email protected] wrote:
>>
>> By default Perl presents Unicode in "NFG" formation, where each grapheme 
>> counts as
>> one character.  A grapheme is what the novice user would think of as a
>> character in their normal everyday life, including any diacritics.
>
> What's with this NFG / Normal Form G that you refer to?  I don't see any
> mention of that in http://unicode.org/reports/tr15/ ... did you mean NFC?


As far as I can tell, NFG isn't an official Unicode Normalization
Format; it's a HLL thing, and it has nothing to do with code points.
When you ask Perl6 for one "character", what you get back (by default)
is one "grapheme" - presumably as defined by UAX #29 - which may be
one or more code points, and who knows how many bytes it winds up
encoded as in memory.

Applescript 2.0 takes this approach as well.

So are there any non-opaque, non-string grapheme representations?
Does ord() work on them?  In AS, the equivalent function is allowed to
return a list of numbers instead of just a single number; in either
case, the value can be passed to the chr() equivalent to get the same
grapheme back.

> For that matter, is it possible for all realistic combinations of diacritics
> and base letters to be represented by a single Unicode codepoint, including
> all language-dependent graphemes?

Absolutely not.  Again, nobody said anything about "code points".
We're talking about Perl6's idea of "characters".

-- 
Mark J. Reed <[email protected]>

Re: r25122 - docs/Perl6/Spec

Reply via email to