t/spec/S02-builtin_data_types/unicode.t has tests like this:

# LATIN CAPITAL LETTER A, COMBINING GRAVE ACCENT
my Str $u = "\x[0041,0300]";
is $u.bytes, 3, 'combining À is three bytes as utf8';
is $u.codes, 2, 'combining À is two codes';
is $u.graphs, 1, 'combining À is one graph';

Which seems to imply that a Str remembers its codepoints, even if it is
in grapheme mode (because that's the default).

Is this correct? I don't really think that's sensible. I'd expect  a
compiler to store strings in composed normalization (+ NFG), so $u.codes
would be 1.

Cheers,
Moritz

Reply via email to