On Wednesday, June 4, 2014 3:11:12 AM UTC+5:30, Paul Sokolovsky wrote:
> With that in mind, I, as many others, think that forcing Unicode bloat
> upon people by default is the most controversial feature of Python3.
> The reason is that you go very long way dealing with languages of the
> people of the world by just treating strings as consisting of 8-bit
> data. I'd say, that's enough for 90% of applications. Unicode is needed
> only if one needs to deal with multiple languages *at the same time*,
> which is fairly rare (remaining 10% of apps).
> And please keep in mind that MicroPython was originally intended (and
> should be remain scalable down to) an MCU. Unicode needed there is even
> less, and even less resources to support Unicode just because.
At some time (when jmf was making more intelligible noises) I had
suggested that the choice between 1/2/4 byte strings that happens at
runtime in python3's FSR can be made at python-start time with a
command-line switch. There are many combinations here; here is one in
Instead of having one (FSR) string engine, you have (upto) 4
- a pure 1 byte (ASCII)
- a pure 2 byte (BMP) with decode-failures for out-of-ranges
- a pure 4 byte -- everything UTF-32
- FSR dynamic switching at runtime (with massive moping from the world's jmfs)
The point is that only one of these engines would be brought into memory
based on command-line/config options.
Some more personal thoughts (that may be quite ill-informed!):
1. I regard myself as a unicode ignoramus+enthusiast. The world will
be a better place if unicode is more pervasive.
As it happens I am also a computer scientist -- I understand that in
contexts where anything other than 8-bit chars is unacceptably
inefficient, unicode-bloat may be a real thing.
2. My casual/cursory reading of the contents of the SMP-planes
suggests that the stuff there is are things like
- egyptian hieroplyphics
- mahjong characters
- ancient greek musical symbols
- alchemical symbols etc etc.
IOW from pov of a universallly acceptable character set this is mostly
And so a pure BMP-supporting implementation may be a reasonable
compromise. [As long as no surrogate-pairs are there]