Re: Does a string remember all Unicode levels?

Helmut Wollmersdorfer Wed, 12 Aug 2009 00:17:11 -0700

Moritz Lenz wrote:

t/spec/S02-builtin_data_types/unicode.t has tests like this:

# LATIN CAPITAL LETTER A, COMBINING GRAVE ACCENT
my Str $u = "\x[0041,0300]";
is $u.bytes, 3, 'combining À is three bytes as utf8';
is $u.codes, 2, 'combining À is two codes';
is $u.graphs, 1, 'combining À is one graph';

Which seems to imply that a Str remembers its codepoints, even if it is
in grapheme mode (because that's the default).

IMHO it's necessary to store the original assertion. Conversion to NFGshould be lazy.

Is this correct? I don't really think that's sensible. I'd expect  a
compiler to store strings in composed normalization (+ NFG), so $u.codes
would be 1.

If a string always stores NFG only - where can we store the result of adecomposition (NFD)?

Also it would be very confusing if a developer just reads a file,filters the lines, and writes them back, if the result is in anothernormalization form.


Helmut Wollmersdorfer

Re: Does a string remember all Unicode levels?

Reply via email to