On Mon, 2005-04-11 at 14:12, Ingo Blechschmidt wrote:

> gcomnz wrote:
> > I'm writing a bunch of examples for perl 6 pleac and it seems rather
> > natural to expect $string.chars to return a list of unicode chars in
> > list context, however I can't find anything to confirm that. (The
> > other alternatives being split and unpack.)
> 
> I like that.

Same here, though I have to admit that I'm slow on this whole Unicode
thing, so I'm not sure what you mean by "Unicode chars". For example,
are you expecting to get "f", "f", "i" or "ï" back when you say
"ï".chars? More interestingly, what about all of the Arabic ligatures
which someone who speaks that language might reasonably expect to get
back as multiple "chars", but they have their own Unicode codepoint
(e.g. ï which is "U+FCF3 ARABIC LIGATURE SHADDA WITH DAMMA MEDIAL FORM"
which you might expect to get "ï", "ï" from)? Any Arabic speakers to
confirm or deny this behavior of ligatures?

Please be aware, I'm talking about ligatures above, NOT special letters
such as "Ã", which are their own letters, and cannot be decomposed into
"a", "e" without losing information.

Given Parrot, what happens when you are presented with a Big5 string
that does not have a strict Unicode equivalent? Does .chars throw an
exception, or does it rely on the string to know how to "characterify
itself" according to its vtable?

-- 
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback


Reply via email to