On Tue, Jul 17, 2018 at 2:31 PM Marko Rauhamaa <ma...@pacujo.net> wrote: > > Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>: > > On Mon, 16 Jul 2018 22:51:32 +0300, Marko Rauhamaa wrote: > >> UTF-8 bytes can only represent the first 128 code points of Unicode. > > > > This is DailyWTF material. Perhaps you want to rethink your wording > > and maybe even learn a bit more about Unicode and the UTF encodings > > before making such statements. > > > > The idea that UTF-8 bytes cannot represent the whole of Unicode is not > > even wrong. Of course a *single* byte cannot, but a single byte is not > > "UTF-8 bytes". > > So I hope that by now you have understood my point and been able to > decide if you agree with it or not. > > > Marko
I still don't understand what's your original point. I think UTF-8 vs UTF-32 is totally different from Python 2 vs 3. For example, string in Rust and Swift (2010s languages!) are *valid* UTF-8. There are strong separation between byte array and string, even they use UTF-8. They looks similar to Python 3, not Python 2. And Python can use UTF-8 for internal encoding in the future. AFAIK, PyPy tries it now. After they succeeded, I want to try port it to CPython after we removed legacy Unicode APIs. (ref PEP 393) So "UTF-8 is better than UTF-32" is totally different problem from "Python 2 is better than 3". Is your point "accepting invalid UTF-8 implicitly by default is better than explicit 'surrogateescape' error handler" like Go? (It's 2010s languages with UTF-8 based string too, but accept invalid UTF-8). Regards, -- INADA Naoki <songofaca...@gmail.com> -- https://mail.python.org/mailman/listinfo/python-list