On Fri, Sep 29, 2017 at 2:04 PM, ToddAndMargo <toddandma...@zoho.com> wrote:

> $ perl6 -e 'my $x="abc"; $x ~= "def"; say $x;'
> abcdef
>
> Perfect!  Thank you!
>
> I am slowly getting away from my Modula 2 "Array of Characters" days.
>
> Question:  Is thee a pretty way like the above to do a prepend?
>

No, sorry. The downside of this is that it's easy to string-append but hard
to string-prepend without having to move stuff around in memory. (The
future may have string-aggregate types, commonly known as "ropes", which
would make this easier.)

As for "array of characters", this quickly turns into a problem --- and
languages that take the 'array of characters' approach are still struggling
with it.

Quick: how many 'characters' is "ň"?

Turns out the answer is: one grapheme OR two Unicode codepoints (Latin
letter lowercase n, combining caron) OR 3 bytes in UTF-8 encoding (2 bytes
for the combining caron). (Among others. Java sees two codepoints in UTF-16
encoding, for 4 bytes. C wants a trailing NUL character, also adding a 4th
byte.) And which one is the correct way to look at it depends on what you
are doing with it.

(This may not matter much to you if you only ever deal with basic Latin-1
like the US uses. But for the past several days I've had to deal with the
names of sports teams from various European countries, with things like ň
and ğ and ş in them --- and that's ignoring the names in Cyrillic or Hebrew
characters. The U.S. is not the whole world. And it's helpful when the
language doesn't force me to jump through weird hoops to deal with them.)

A string can't simultaneously be three different lists. So it ends up being
one thing, and we provide ways to decompose it into the various other
forms. But if you are just thinking of strings of text, we don't make you
think about that; we provide specific string operations instead of making
you figure out which way to decompose it and add the new part and recompose
it. The string operations operate at grapheme level, because when you are
thinking of it as text, that's usually what you intend: you see one
'character' (grapheme) there, not the two codepoints or the 3 bytes or
whatever --- but they have to be clever, because what if you are appending
another combining character?

-- 
brandon s allbery kf8nh                               sine nomine associates
allber...@gmail.com                                  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

Reply via email to