Re: fmt replaces utf8 spaces for ascii ones
On Sun, Feb 12, 2017 at 10:21:11PM -0800, Eric Pruitt wrote: > Unfortunately I do not have access to an OpenBSD machine to verify > whether or not its fmt does the correct thing. By the way, if you try your example in openbsd take in care obsd printf won't recognize \u00a0. Use '\xc2\xa0' instead. I was trying your example in a linux machine obtaining your same results. But I did it mostly because I was curious about the other difference: the GNU version inserts the new line 'in' the number assigned by -w, giving you in this case a 19 wide line as result. The obsd version breaks the line in the following character giving you a 20 chars wide line. Back to the original topic. What made me hesitate if 'feature' or 'bug' was the man page. The following two paragraphs made me think converting all spaces to ascii could be desired as a practical solution: fmt is meant to format mail messages prior to sending, but may also be useful for other simple tasks... The program was designed to be simple and fast – for more complex operations, the standard text processors are likely to be more appropriate.
Re: fmt replaces utf8 spaces for ascii ones
On Sun, Feb 12, 2017 at 10:21:11PM -0800, Eric Pruitt wrote: > On Sun, Feb 12, 2017 at 09:21:37PM +0100, Walter Alejandro Iglesias wrote: > > After investigating a bit I realized that what I called utf8 space is a > > 'nobreakspace' so it's ok fmt to replace them for ascii ones. I made a > > stupid question. Sorry! > > If that's the behavior you see, I think _that_ is a bug: the reason > non-breaking spaces exist is so programs do not separate words at that > character (https://en.wikipedia.org/wiki/Non-breaking_space). GNU fmt > respects non-breaking spaces and handles them accordingly: > > ~$ fmt --version | head -n1 > fmt (GNU coreutils) 8.25 > ~$ printf " XXX\u00a0XXX XXX" | fmt -w 20 > > XXX XXX > XXX > ~$ printf " XXX XXX XXX" | fmt -w 20 > > XXX > XXX XXX > > Unfortunately I do not have access to an OpenBSD machine to verify > whether or not its fmt does the correct thing. > > Eric OpenBSD 6.0-current (GENERIC.MP) #0: Sat Feb 11 09:48:19 CET 2017 morl...@server.roquesor.com:/usr/src/sys/arch/amd64/compile/GENERIC.MP $ printf " XXX\u00a0XXX XXX" | LC_CTYPE=en_US.UTF-8 fmt -w 20 XXX XXX XXX $ printf " XXX XXX XXX" | LC_CTYPE=en_US.UTF-8 fmt -w 20 XXX XXX XXX $ printf " XXX\u00a0XXX XXX" | LC_CTYPE=C fmt -w 20 XXX XXX XXX $ printf " XXX XXX XXX" | LC_CTYPE=C fmt -w 20 XXX XXX XXX Thanks Eric.
Re: fmt replaces utf8 spaces for ascii ones
On Sun, Feb 12, 2017 at 09:21:37PM +0100, Walter Alejandro Iglesias wrote: > After investigating a bit I realized that what I called utf8 space is a > 'nobreakspace' so it's ok fmt to replace them for ascii ones. I made a > stupid question. Sorry! If that's the behavior you see, I think _that_ is a bug: the reason non-breaking spaces exist is so programs do not separate words at that character (https://en.wikipedia.org/wiki/Non-breaking_space). GNU fmt respects non-breaking spaces and handles them accordingly: ~$ fmt --version | head -n1 fmt (GNU coreutils) 8.25 ~$ printf " XXX\u00a0XXX XXX" | fmt -w 20 XXX XXX XXX ~$ printf " XXX XXX XXX" | fmt -w 20 XXX XXX XXX Unfortunately I do not have access to an OpenBSD machine to verify whether or not its fmt does the correct thing. Eric
Re: fmt replaces utf8 spaces for ascii ones
After investigating a bit I realized that what I called utf8 space is a 'nobreakspace' so it's ok fmt to replace them for ascii ones. I made a stupid question. Sorry!
fmt replaces utf8 spaces for ascii ones
Hello, Probably Ingo will know about this. fmt, when using utf8 locale, replaces utf8 spaces for ascii ones (I use utf8 spaces in html to get web browsers render doble space at the end of a sentence). This doesn't happen with LC_CTYPE=C. Is this feature or a bug?