On 5 June 2014 22:01, Paul Sokolovsky <pmis...@gmail.com> wrote: >> Aside from >> some of the POSIX locale handling issues on Linux, many of the >> concerns are with the usability of bytes and bytearray, not with str - >> that's why binary interpolation is coming back in 3.5, and there will >> likely be other usability tweaks for those types as well. > > All these changes are what let me dream on and speculate on > possibility that Python4 could offer an encoding-neutral string type > (which means based on bytes), while move unicode back to an explicit > type to be used explicitly only when needed (bloated frameworks like > Django can force users to it anyway, but that will be forcing on > framework level, not on language level, against which people rebel.) > People can dream, right?
If you don't model strings as arrays of code points, or at least assume a particular universal encoding (like UTF-8), you have to give up string concatenation in order to tolerate arbitrary encodings - otherwise you end up with unintelligible data that nobody can decode because it switches encodings without notice. That's a viable model if your OS guarantees it (Mac OS X does, for example, so Python 3 assumes UTF-8 for all OS interfaces there), but Linux currently has no such guarantee - many runtimes just decide they don't care, and assume UTF-8 anyway (Python 3 may even join them some day, due to the problems caused by trusting the locale encoding to be correct, but the startup code will need non-trivial changes for that to happen - the C.UTF-8 locale may even become widespread before we get there). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com