In the abstract, Perl is written in Unicode, and has consistent Unicode
-semantics regardless of the underlying text representations.
+semantics regardless of the underlying text representations. By default
+Perl presents Unicode in "NFG" formation, where each grapheme counts as
+one character. A grapheme is what the novice user would think of as a
+character in their normal everyday life, including any diacritics.
What's with this NFG / Normal Form G that you refer to? I don't see any mention
of that in http://unicode.org/reports/tr15/ ... did you mean NFC?
For that matter, is it possible for all realistic combinations of diacritics and
base letters to be represented by a single Unicode codepoint, including all
I thought NFC sort of did one codepoint per grapheme but there were a few
exceptions ... I could be wrong on that point.
-- Darren Duncan