Guillaume Sanchez added the comment:

Hello,

I come from bugs.python.org/issue30717 . I have a pending PR that needs review 
( https://github.com/python/cpython/pull/2673 ) adding a function that breaks 
unicode strings into grapheme clusters (aka what one would intuitively call "a 
character"). It's based on the grapheme cluster breaking algorithm from TR29.

Let me know if this is of any relevance.

Quick demo:
>>> a=unicodedata.break_graphemes("lol")
>>> list(a)
['l', 'o', 'l']
>>> list(unicodedata.break_graphemes("lo\u0309l"))
['l', 'ỏ', 'l']
>>> list(unicodedata.break_graphemes("lo\u0309\u0301l"))
['l', 'ỏ́', 'l']
>>> list(unicodedata.break_graphemes("lo\u0301l"))
['l', 'ó', 'l']
>>> list(unicodedata.break_graphemes(""))
[]

----------
nosy: +Guillaume Sanchez

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12568>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to