Thanks for clarification!
I tested it with Bash script:
chars=$(wc -m mylog|cut -d ' ' -f1)
lines=$(wc -l mylog|cut -d ' ' -f1)
let chars=$chars - $lines
echo $chars
and got the same number as given by vim
:%s/.//gn
(Which was place from what I got confused.)
Hopefully this bug description
2015-06-06 21:49:16 +0300, Valdis Vītoliņš:
Note, that UTF-8 characters can be counted by counting bytes with bit
patterns 0xxx or 11xx:
https://en.wikipedia.org/wiki/UTF-8#Description
So, general logic should be, that, if:
a) locale setting is utf-8 (e.g. LANG=xx_XX.UTF-8), or
b)
Note, that UTF-8 characters can be counted by counting bytes with bit
patterns 0xxx or 11xx:
https://en.wikipedia.org/wiki/UTF-8#Description
So, general logic should be, that, if:
a) locale setting is utf-8 (e.g. LANG=xx_XX.UTF-8), or
b) first two bytes of file are 0xFE 0xFF
tag 20751 notabug
close 20751
stop
On 06/06/15 19:49, Valdis Vītoliņš wrote:
Version: wc (GNU coreutils) 8.21
When 'wc -m' is invoked, it should print character count, but it counts
incorrectly UTF-8 encoded characters. Attached files have 3, 4 an 6
bytes in them, but all have only two UTF-8
You mailed submit@debbugs without specifying a Package:, so your bug
report ended up on the help-debbugs list. I have reassigned it to
coreutils. (Please note there is no wc package.)
(My mailer is messing up the UTF-8 characters in your report.
Interested parties can see the original at