On Wed, Jun 4, 2014 at 2:40 PM, Rustom Mody <rustompm...@gmail.com> wrote:
> On Wednesday, June 4, 2014 9:22:54 AM UTC+5:30, Chris Angelico wrote:
>> On Wed, Jun 4, 2014 at 1:37 PM, Rustom Mody wrote:
>> > And so a pure BMP-supporting implementation may be a reasonable
>> > compromise. [As long as no surrogate-pairs are there]
>> Not if you're working on the internet. There are several critical
>> groups of characters that aren't in the BMP, such as:
> Of course. But what has the internet to do with micropython?
Earlier you said:
> IOW from pov of a universallly acceptable character set this is mostly
"Universally acceptable character set" and microcontrollers may well
not meet, but if you're talking about universality, you need Unicode.
It's that simple.
Maybe there's a use-case for a microcontroller that works in
ISO-8859-5 natively, thus using only eight bits per character, but
even if there is, I would expect a Python implementation on it to
expose Unicode codepoints in its strings. (Most of the time you won't
even be aware of the exact codepoint values. It's only when you put
\xNN or \uNNNN or U000NNNNN escapes into your strings, or explicitly
use ord/chr or equivalent, that it'd make a difference.) The point is
not that you might be able to get away with sticking your head in the
sand and wishing Unicode would just go away. Even if you can, it's not
something Python 3 can ever do.
And I don't think anybody can, anyway. If your device is big enough to
hold Python, it should be big enough to handle Unicode; and then you
don't have to say "Oh, sorry rest-of-the-world, this only works in
English... and only a subset of English... and stuff".