[issue12568] Add functions to get the width in columns of a character

poq Sat, 10 Mar 2012 16:32:20 -0800

poq <[email protected]> added the comment:

It seems this is a bit of a minefield...


GNOME Terminal/libvte has an environment variable (VTE_CJK_WIDTH) to override 
the handling of ambiguous width characters. It bases its default on the locale 
(with the comment 'This is basically what GNU libc does').

urxvt just uses system wcwidth.

Xterm uses some voodoo to decide between system wcwidth and mk_wcwidth(_cjk): 
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

I think the simplest solution is to just expose libc's wc(s)width. It is widely 
used and is most likely to match the behaviour of the terminal.

FWIW I wrote a little script to test the widths of all Unicode characters, and 
came up with the following logic to match libvte behaviour:

def wcwidth(c, legacy_cjk=False):
        if c in u'\t\r\n\10\13\14': raise ValueError('character %r has no 
intrinsic width' % c)
        if c in u'\0\5\7\16\17': return 0
        if u'\u1160' <= c <= u'\u11ff': return 0 # hangul jamo
        if unicodedata.category(c) in ('Mn', 'Me', 'Cf') and c != u'\u00ad': 
return 0 # 00ad = soft hyphen
        eaw = unicodedata.east_asian_width(c)
        if eaw in ('F', 'W'): return 2
        if legacy_cjk and eaw == 'A': return 2
        return 1

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue12568>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12568] Add functions to get the width in columns of a character

Reply via email to