EuanM wrote > ... > all ISO-8859-1 maps 1:1 to Unicode UTF-8 > ...
I am late coming in to this conversation. If it hasn't already been said, please do not conflate Unicode and UTF-8. I think that would be a recipe for a high P.I.T.A. factor. Unicode defines the meaning of the code points. UTF-8 (and -16) define an interchange mechanism. In other words, when you write the code points to an external medium (socket, file, whatever), encode them via UTF-whatever. Read UTF-whatever from an external medium and re-instantiate the code points. (Personally, I see no use for UTF-16 as an interchange mechanism. Others may have justification for it. I don't.) Having characters be a consistent size in their object representation makes everything easier. #at:, #indexOf:, #includes: ... no one wants to be scanning through bytes representing variable sized characters. Model Unicode strings using classes such as e.g. Unicode7, Unicode16, and Unicode32, with automatic coercion to the larger character width. -- View this message in context: http://forum.world.st/Unicode-Support-tp4865139p4866610.html Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.
