bug#20751: wc -m doesn't count UTF-8 characters properly

2015-06-07 Thread Valdis Vītoliņš
Thanks for clarification! I tested it with Bash script: chars=$(wc -m mylog|cut -d ' ' -f1) lines=$(wc -l mylog|cut -d ' ' -f1) let chars=$chars - $lines echo $chars and got the same number as given by vim :%s/.//gn (Which was place from what I got confused.) Hopefully this bug description

bug#20751: wc -m doesn't count UTF-8 characters properly

2015-06-06 Thread Valdis Vītoliņš
Note, that UTF-8 characters can be counted by counting bytes with bit patterns 0xxx or 11xx: https://en.wikipedia.org/wiki/UTF-8#Description So, general logic should be, that, if: a) locale setting is utf-8 (e.g. LANG=xx_XX.UTF-8), or b) first two bytes of file are 0xFE 0xFF