On Wednesday, June 4, 2014 10:50:21 AM UTC+5:30, Steven D'Aprano wrote:
> On Tue, 03 Jun 2014 20:37:27 -0700, Rustom Mody wrote:
> > And so a pure BMP-supporting implementation may be a reasonable
> > compromise. [As long as no surrogate-pairs are there]
> At the cost on one extra bit, strings could use UTF-16 internally and
> still have correct behaviour. The bit could be a flag recording whether
> the string contains any surrogate pairs. If the flag was 0, all string
> operations could assume a constant 2-bytes-per-character. If the flag was
> 1, it could fall back to walking the string checking for surrogate pairs.
Yes. That could be one possibility. My main reason in giving the
4-engine choice was not that 4 engines are a good idea but that in the
very differently constrained world of μ-controllers playing around with
alternate binding times may be advantageous
> > On Wednesday, June 4, 2014 3:11:12 AM UTC+5:30, Paul Sokolovsky wrote:
> >> With that in mind, I, as many others, think that forcing Unicode bloat
> >> upon people by default is the most controversial feature of Python3.
> >> The reason is that you go very long way dealing with languages of the
> >> people of the world by just treating strings as consisting of 8-bit
> >> data. I'd say, that's enough for 90% of applications. Unicode is needed
> >> only if one needs to deal with multiple languages *at the same time*,
> >> which is fairly rare (remaining 10% of apps).
> >> And please keep in mind that MicroPython was originally intended (and
> >> should be remain scalable down to) an MCU. Unicode needed there is even
> >> less, and even less resources to support Unicode just because.
> > At some time (when jmf was making more intelligible noises) I had
> > suggested that the choice between 1/2/4 byte strings that happens at
> > runtime in python3's FSR can be made at python-start time with a
> > command-line switch. There are many combinations here; here is one in
> > more detail:
> > Instead of having one (FSR) string engine, you have (upto) 4
> > - a pure 1 byte (ASCII)
> There are only 128 ASCII characters, so a pure ASCII implementation
> cannot even represent arbitrary bytes.
Yes this is a subtle point.
I was initially going to write Latin-1. Wrote a rough-n-ready ASCII.
But maybe it could be a choice.
I really dont understand the binding-times of μ-controllers.
My impression is that actual development is split
1 tinkering with the board
2 working on full powered computers and downloading to the board
In going from 2 to 1 heavy amounts of cut-downs are probably possible and
desirable. If this is the case, having hooks in the system for making choices
may be a good idea
optimal choices may be worthwhile