Re: Grapheme clusters, a.k.a.real characters

Rhodri James Fri, 14 Jul 2017 07:55:46 -0700

On 14/07/17 15:32, Michael Torrie wrote:

On 07/14/2017 08:05 AM, Rhodri James wrote:

On 14/07/17 14:31, Marko Rauhamaa wrote:

Of course, UTF-8 in a bytes object doesn't make the situation any
better, but does it make it any worse?


Speaking as someone who has been up to his elbows in this recently, I
would say emphatically that it does make things worse.  It adds an extra
layer of complexity to all of the questions you were asking, and more.
A single codepoint is a meaningful thing, even if its meaning may be
modified by combining.  A single byte may or may not be meaningful.


Are you saying that dealing with Unicode in Google Go, which uses UTF-8
in memory, is adding an extra layer of complexity and makes things worse
than they might be in Python?

I'm not familiar with Go. If the programmer has to be aware that theshe is using UTF-8 under the hood, then yes, it does add an extra layerof complexity. You have to remember the rules of UTF-8 as well aseverything else.


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list

Re: Grapheme clusters, a.k.a.real characters

Reply via email to