2011/1/5 Alexander Polakov <polac...@gmail.com>:
> 1) wcwidth(0x200B)
> This if from http://unicode.org/Public/UNIDATA/ :
>
> 200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;
> 200C;ZERO WIDTH NON-JOINER;Cf;0;BN;;;;;N;;;;;
> 200D;ZERO WIDTH JOINER;Cf;0;BN;;;;;N;;;;;
>
> --- share/locale/ctype/en_US.UTF-8.src.orig B  B  Tue Jan B 4 22:49:22 2011
> +++ share/locale/ctype/en_US.UTF-8.src B Tue Jan B 4 22:50:55 2011
> @@ -1672,7 +1672,8 @@
> B BLANK B  B  0x2000 - 0x200b B 0x202f B 0x205f
> B PRINT B  B  0x2000 - 0x200b B 0x2010 - 0x2029 B 0x202f - 0x2052 B 0x2057
> B PRINT B  B  0x205f
> -SWIDTH1 B  0x2000 - 0x200b B 0x2010 - 0x2029 B 0x202f - 0x2052 B 0x2057
> +SWIDTH1 B  0x2000 - 0x200c B 0x2010 - 0x2029 B 0x202f - 0x2052 B 0x2057
> +SWIDTH0 B  0x200b - 0x200d
> B SWIDTH1 B  0x205f

That only solves the test case. All combining characters(diacritic
marks), including 0x300, should be 0 width as well.

Accepted interpretation of Unicode rules appears to be that Cf, Me and
Mf categories +- a few characters are to be 0-spaced, see the comments
in:
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

That file also happens to be in xenocara/app/xterm/wcwidth.c so that
was the behavior in xterm until(I assume) it started using the system
version.

The database file in OpenBSD is just too old, the same problem file
was fixed in FreeBSD in 2006, see:
http://code.bsd64.org/cvsweb/freebsd/src/share/mklocale/UTF-8.src

Reply via email to