On Wednesday, June 4, 2014 10:50:21 AM UTC+5:30, Steven D'Aprano wrote: > On Tue, 03 Jun 2014 20:37:27 -0700, Rustom Mody wrote: > > And so a pure BMP-supporting implementation may be a reasonable > > compromise. [As long as no surrogate-pairs are there]
> At the cost on one extra bit, strings could use UTF-16 internally and > still have correct behaviour. The bit could be a flag recording whether > the string contains any surrogate pairs. If the flag was 0, all string > operations could assume a constant 2-bytes-per-character. If the flag was > 1, it could fall back to walking the string checking for surrogate pairs. Yes. That could be one possibility. My main reason in giving the 4-engine choice was not that 4 engines are a good idea but that in the very differently constrained world of μ-controllers playing around with alternate binding times may be advantageous > > On Wednesday, June 4, 2014 3:11:12 AM UTC+5:30, Paul Sokolovsky wrote: > >> With that in mind, I, as many others, think that forcing Unicode bloat > >> upon people by default is the most controversial feature of Python3. > >> The reason is that you go very long way dealing with languages of the > >> people of the world by just treating strings as consisting of 8-bit > >> data. I'd say, that's enough for 90% of applications. Unicode is needed > >> only if one needs to deal with multiple languages *at the same time*, > >> which is fairly rare (remaining 10% of apps). > >> And please keep in mind that MicroPython was originally intended (and > >> should be remain scalable down to) an MCU. Unicode needed there is even > >> less, and even less resources to support Unicode just because. > > At some time (when jmf was making more intelligible noises) I had > > suggested that the choice between 1/2/4 byte strings that happens at > > runtime in python3's FSR can be made at python-start time with a > > command-line switch. There are many combinations here; here is one in > > more detail: > > Instead of having one (FSR) string engine, you have (upto) 4 > > - a pure 1 byte (ASCII) > There are only 128 ASCII characters, so a pure ASCII implementation > cannot even represent arbitrary bytes. Yes this is a subtle point. I was initially going to write Latin-1. Wrote a rough-n-ready ASCII. But maybe it could be a choice. I really dont understand the binding-times of μ-controllers. My impression is that actual development is split 1 tinkering with the board 2 working on full powered computers and downloading to the board In going from 2 to 1 heavy amounts of cut-downs are probably possible and desirable. If this is the case, having hooks in the system for making choices may be a good idea optimal choices may be worthwhile -- https://mail.python.org/mailman/listinfo/python-list