Steven D'Aprano added the comment:

http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

talks about *grapheme clusters*, not "graphemes" alone, and it seems clear to 
me that they are language dependent. For example, it says:

The Unicode Standard provides default algorithms for determining grapheme 
cluster boundaries, with two variants: legacy grapheme clusters and extended 
grapheme clusters. The most appropriate variant depends on the language and 
operation involved. ... These algorithms can be adapted to produce tailored 
grapheme clusters for specific locales...


Nevertheless, even just a basic API to either the *legacy grapheme cluster* or 
the *extended grapheme cluster* algorithms would be a good start.

Can I suggest that the unicodedata module might be the right place for it?

And thank you for volunteering to do the work on this!

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30717>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to