On Sat, Jul 15, 2017 at 12:32 AM, Michael Torrie <torr...@gmail.com> wrote: > On 07/14/2017 08:05 AM, Rhodri James wrote: >> On 14/07/17 14:31, Marko Rauhamaa wrote: >>> Of course, UTF-8 in a bytes object doesn't make the situation any >>> better, but does it make it any worse? >> >> Speaking as someone who has been up to his elbows in this recently, I >> would say emphatically that it does make things worse. It adds an extra >> layer of complexity to all of the questions you were asking, and more. >> A single codepoint is a meaningful thing, even if its meaning may be >> modified by combining. A single byte may or may not be meaningful. > > Are you saying that dealing with Unicode in Google Go, which uses UTF-8 > in memory, is adding an extra layer of complexity and makes things worse > than they might be in Python?
Can you reverse a string in Go? How do you do it? With Python, you can sometimes get tripped up, eg if you have: * combining characters * Arabic letters, which can look very different when reordered * explicit directionality markers But the semantics are at least easy to comprehend: you have a strict reversal of code unit order. So you can reverse a string for parsing purposes, and then re-reverse the subsections. If you have a UTF-8 bytestring, a naive reversal will trip you up if you have *any* non-ASCII values in there. You will have invalid UTF-8. So *at very least*, your "reverse string" code has to be UTF-8 aware - it has to keep continuation bytes with the correct start byte. And you *still* have all the concerns that Python has. Extra complexity. QED. ChrisA -- https://mail.python.org/mailman/listinfo/python-list