Terry J. Reedy added the comment:
For future reference, small code examples should be including in the message or
uploaded as a .py file.
A unicode string is a sequence of codepoints. The length is defined as the
number of codepoints. I cannot see that your example demonstrates a bug in
Steven D'Aprano added the comment:
By the way, perhaps a simpler demonstration which is more likely to render
correctly on most people's systems would be to use Latin-1 combining characters:
py> s1 = 'àéîõü'
py> s2 = unicodedata.normalize('NFD', s1) # decompose into combining chars
py> s1, s2
Steven D'Aprano added the comment:
I don't really understand your example code. What result did you expect? The
output shown in Github seems correct to me:
optional arguments:
-h, --helpshow this help message and exit
--language1 XX
Lanugage for
New submission from Vanessa McHale:
Currently, python computes string widths based on number of characters.
However, this will not work in general because some languages have e.g. vowel
markers: 'བོད' for instance is three characters but its width should be two.
I have an example repo here: