Re: fmt replaces utf8 spaces for ascii ones

2017-02-13 Thread Walter Alejandro Iglesias
On Sun, Feb 12, 2017 at 10:21:11PM -0800, Eric Pruitt wrote:
> Unfortunately I do not have access to an OpenBSD machine to verify
> whether or not its fmt does the correct thing.

By the way, if you try your example in openbsd take in care obsd printf
won't recognize \u00a0.  Use '\xc2\xa0' instead.

I was trying your example in a linux machine obtaining your same results.
But I did it mostly because I was curious about the other difference: the
GNU version inserts the new line 'in' the number assigned by -w, giving
you in this case a 19 wide line as result.  The obsd version breaks the
line in the following character giving you a 20 chars wide line.

Back to the original topic.  What made me hesitate if 'feature' or 'bug'
was the man page.  The following two paragraphs made me think converting
all spaces to ascii could be desired as a practical solution:

 fmt is meant to format mail messages prior to sending, but may also
 be useful for other simple tasks...

 The program was designed to be simple and fast – for more complex
 operations, the standard text processors are likely to be more
 appropriate.



Re: fmt replaces utf8 spaces for ascii ones

2017-02-12 Thread Walter Alejandro Iglesias
On Sun, Feb 12, 2017 at 10:21:11PM -0800, Eric Pruitt wrote:
> On Sun, Feb 12, 2017 at 09:21:37PM +0100, Walter Alejandro Iglesias wrote:
> > After investigating a bit I realized that what I called utf8 space is a
> > 'nobreakspace' so it's ok fmt to replace them for ascii ones.  I made a
> > stupid question.  Sorry!
> 
> If that's the behavior you see, I think _that_ is a bug: the reason
> non-breaking spaces exist is so programs do not separate words at that
> character (https://en.wikipedia.org/wiki/Non-breaking_space). GNU fmt
> respects non-breaking spaces and handles them accordingly:
> 
> ~$ fmt --version | head -n1
> fmt (GNU coreutils) 8.25
> ~$ printf " XXX\u00a0XXX XXX" | fmt -w 20
> 
> XXX XXX
> XXX
> ~$ printf " XXX XXX XXX" | fmt -w 20
> 
> XXX
> XXX XXX
> 
> Unfortunately I do not have access to an OpenBSD machine to verify
> whether or not its fmt does the correct thing.
> 
> Eric


OpenBSD 6.0-current (GENERIC.MP) #0: Sat Feb 11 09:48:19 CET 2017
morl...@server.roquesor.com:/usr/src/sys/arch/amd64/compile/GENERIC.MP

$ printf " XXX\u00a0XXX XXX" | LC_CTYPE=en_US.UTF-8 fmt -w 20
 XXX
XXX XXX
$ printf " XXX XXX XXX" | LC_CTYPE=en_US.UTF-8 fmt -w 20
 XXX
XXX XXX
$ printf " XXX\u00a0XXX XXX" | LC_CTYPE=C fmt -w 20

XXX XXX
XXX
$ printf " XXX XXX XXX" | LC_CTYPE=C fmt -w 20
 XXX
XXX XXX



Thanks Eric.



Re: fmt replaces utf8 spaces for ascii ones

2017-02-12 Thread Eric Pruitt
On Sun, Feb 12, 2017 at 09:21:37PM +0100, Walter Alejandro Iglesias wrote:
> After investigating a bit I realized that what I called utf8 space is a
> 'nobreakspace' so it's ok fmt to replace them for ascii ones.  I made a
> stupid question.  Sorry!

If that's the behavior you see, I think _that_ is a bug: the reason
non-breaking spaces exist is so programs do not separate words at that
character (https://en.wikipedia.org/wiki/Non-breaking_space). GNU fmt
respects non-breaking spaces and handles them accordingly:

~$ fmt --version | head -n1
fmt (GNU coreutils) 8.25
~$ printf " XXX\u00a0XXX XXX" | fmt -w 20

XXX XXX
XXX
~$ printf " XXX XXX XXX" | fmt -w 20

XXX
XXX XXX

Unfortunately I do not have access to an OpenBSD machine to verify
whether or not its fmt does the correct thing.

Eric



Re: fmt replaces utf8 spaces for ascii ones

2017-02-12 Thread Walter Alejandro Iglesias
After investigating a bit I realized that what I called utf8 space is a
'nobreakspace' so it's ok fmt to replace them for ascii ones.  I made a
stupid question.  Sorry!



fmt replaces utf8 spaces for ascii ones

2017-02-11 Thread Walter Alejandro Iglesias
Hello,

Probably Ingo will know about this.

fmt, when using utf8 locale, replaces utf8 spaces for ascii ones (I use
utf8 spaces in html to get web browsers render doble space at the end of
a sentence).  This doesn't happen with LC_CTYPE=C.

Is this feature or a bug?