On Tuesday, April 28, 2015 at 9:23:40 PM UTC-7, [email protected] wrote:
>
> (Much more important is handling of len(), looping over a string and so
> on. But they are another story.)
>
That's a straight encoding problem. It seems that when regarded as "str",
you get the utf8 encoding in bytes, so ä consists of two bytes, even
though it prints as one character. Iterating over a string iterates over
the bytes. A "unicode" consists of unicode codepoints, so ä is one unit:
sage: len(u"Direct translation of 'Mäntysalo' is 'Pine forest'")
50
sage: len("Direct translation of 'Mäntysalo' is 'Pine forest'")
51
sage: print u"Direct translation of 'Mäntysalo' is 'Pine forest'"[24:26]
än
sage: print "Direct translation of 'Mäntysalo' is 'Pine forest'"[24:26]
ä
If you're going to use unicode (i.e., if you're going to use characters
that don't fit in ascii), use "unicode" objects. That's what python3 does
all the time (for strings).
--
You received this message because you are subscribed to the Google Groups
"sage-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.