Hi,

Eric Pruitt wrote on Sun, Feb 12, 2017 at 10:21:11PM -0800:

> I think _that_ is a bug: the reason non-breaking spaces exist
> is so programs do not separate words at that character
> (https://en.wikipedia.org/wiki/Non-breaking_space). GNU fmt
> respects non-breaking spaces and handles them accordingly:
> 
>     ~$ fmt --version | head -n1
>     fmt (GNU coreutils) 8.25
>     ~$ printf "XXXX XXXXXXXXXXXXXXX\u00a0XXX XXX" | fmt -w 20
>     XXXX
>     XXXXXXXXXXXXXXX XXX
>     XXX
>     ~$ printf "XXXX XXXXXXXXXXXXXXX XXX XXX" | fmt -w 20
>     XXXX
>     XXXXXXXXXXXXXXX
>     XXX XXX

That is kind of hard to deny.

So here is a fix.

Before:

 $ printf "123456789 123456789\xc2\xa0x23456789 123456789" | fmt -w 20
123456789 123456789
x23456789 123456789

After:

 $ printf "123456789 123456789\xc2\xa0x23456789 123456789" | fmt -w 20
123456789
123456789 x23456789
123456789

Doesn't break regression tests.

While we cannot handle each and every corner case of arcane Unicode
characters and should not even try, U+00A0 NO-BREAK SPACE seems
common enough to me to be handled even in such a simple tool, in
particular given that the fix is trivial.

OK?
  Ingo


Index: fmt.c
===================================================================
RCS file: /cvs/src/usr.bin/fmt/fmt.c,v
retrieving revision 1.36
diff -u -p -r1.36 fmt.c
--- fmt.c       7 Jan 2016 18:02:43 -0000       1.36
+++ fmt.c       19 Feb 2017 20:18:45 -0000
@@ -468,7 +468,7 @@ process_stream(FILE *stream, const char 
                                            tab_width - line_width;
                                else if ((wcw = wcwidth(wc)) == -1)
                                        wcw = 1;
-                               if (iswblank(wc)) {
+                               if (iswblank(wc) && wc != 0xa0) {
                                        /* Skip whitespace at start of line. */
                                        if (word_length == 0) {
                                                wordp += wcl;

Reply via email to