Hi, Eric Pruitt wrote on Sun, Feb 12, 2017 at 10:21:11PM -0800:
> I think _that_ is a bug: the reason non-breaking spaces exist > is so programs do not separate words at that character > (https://en.wikipedia.org/wiki/Non-breaking_space). GNU fmt > respects non-breaking spaces and handles them accordingly: > > ~$ fmt --version | head -n1 > fmt (GNU coreutils) 8.25 > ~$ printf "XXXX XXXXXXXXXXXXXXX\u00a0XXX XXX" | fmt -w 20 > XXXX > XXXXXXXXXXXXXXXÂ XXX > XXX > ~$ printf "XXXX XXXXXXXXXXXXXXX XXX XXX" | fmt -w 20 > XXXX > XXXXXXXXXXXXXXX > XXX XXX That is kind of hard to deny. So here is a fix. Before: $ printf "123456789 123456789\xc2\xa0x23456789 123456789" | fmt -w 20 123456789 123456789 x23456789 123456789 After: $ printf "123456789 123456789\xc2\xa0x23456789 123456789" | fmt -w 20 123456789 123456789 x23456789 123456789 Doesn't break regression tests. While we cannot handle each and every corner case of arcane Unicode characters and should not even try, U+00A0 NO-BREAK SPACE seems common enough to me to be handled even in such a simple tool, in particular given that the fix is trivial. OK? Ingo Index: fmt.c =================================================================== RCS file: /cvs/src/usr.bin/fmt/fmt.c,v retrieving revision 1.36 diff -u -p -r1.36 fmt.c --- fmt.c 7 Jan 2016 18:02:43 -0000 1.36 +++ fmt.c 19 Feb 2017 20:18:45 -0000 @@ -468,7 +468,7 @@ process_stream(FILE *stream, const char tab_width - line_width; else if ((wcw = wcwidth(wc)) == -1) wcw = 1; - if (iswblank(wc)) { + if (iswblank(wc) && wc != 0xa0) { /* Skip whitespace at start of line. */ if (word_length == 0) { wordp += wcl;
