Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output

2019-12-07 Thread Paul Eggert
On 12/7/19 5:56 AM, Bruno Haible wrote:
> So, we should write
>   L’Oréal and L’chaim with U+2019
>   and OʼHara, OʼConnor with U+02BC.

I wouldn’t use U+02BC MODIFIER LETTER APOSTROPHE in the Anglicized form of Irish
names. The Irish-language spelling is of course quite different, but even then I
wouldn’t use U+02BC; e.g., the Anglicized “O’” corresponds to the Irish “Ó ”.

A quick Google search suggests that U+02BC is used for ejective consonants in
IPA, and in some languages such as Bodo, but I do not know why U+02BC would be
used for Anglicized Irish names.



Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output

2019-12-07 Thread Pádraig Brady

On 07/12/2019 04:10, Bruno Haible wrote:

Wes Hurd wrote:

What I meant about smart quotes being dangerous was, if copying the output
text that contains smart quotes to use somewhere else (especially in code),
the smart quotes have to be manually replaced which is tedious for the user
(programmer).


It's quite the opposite: The smart quote characters are safer because they
don't interfere with the quoting of a shell or programming language. For
example, in my music directory, I have a directory T’pau.

$ ls T’pau
China in your hand.wav

If the directory was named T'pau and I were to write the command

$ ls T'pau

it would hang. And

$ ls 'T'pau'

would hang as well. *This* is dangerous.


The user may not even see that smart quotes are being used unless there is
a breaking error.


But the smart quotes look different than an apostrophe! No confusion is 
possible.
Your argument was correct 40 years ago, when
   1. the shape of the apostrophe was given by US-ASCII and was not vertical 
[1],
   2. a character cell had so few pixels in height and width that smart quotes
  and apostrophes could not be rendered differently.

The reason why quotearg.c uses smart quotes is explained in [2].

Bruno

[1] https://www.cl.cam.ac.uk/~mgk25/ucs/apostrophe.html
[2] https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html




Agreeing with Bruno here.

Note gnulib supports shell quoting, which coreutils uses to safely quote file 
names:
https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v8.24-71-g08e8fd7
Also coreutils ls leverages this to safely output file names by default.

Note also re apostrophe, word boundaries may be a consideration.
As I previously tweeted:

  It's awkward for file names to use shell quote
  It’s awkward for word regex to use right quote (\u2019)
  Itʼs best to use apostrophe modifier (\u02BC)

cheers,
Pádraig



Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output

2019-12-07 Thread Bruno Haible
Pádraig Brady wrote:
>It’s awkward for word regex to use right quote (\u2019)
>Itʼs best to use apostrophe modifier (\u02BC)

Unicode.org recommends:
  * U+2019 is the preferred character for a punctuation apostrophe,
  * U+02BC is a glottal stop, used by many languages as a letter of their
alphabets.

So, we should write
  L’Oréal and L’chaim with U+2019
  and OʼHara, OʼConnor with U+02BC.

Bruno




Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output

2019-12-06 Thread Bruno Haible
Wes Hurd wrote:
> What I meant about smart quotes being dangerous was, if copying the output
> text that contains smart quotes to use somewhere else (especially in code),
> the smart quotes have to be manually replaced which is tedious for the user
> (programmer).

It's quite the opposite: The smart quote characters are safer because they
don't interfere with the quoting of a shell or programming language. For
example, in my music directory, I have a directory T’pau.

$ ls T’pau
China in your hand.wav

If the directory was named T'pau and I were to write the command

$ ls T'pau

it would hang. And

$ ls 'T'pau'

would hang as well. *This* is dangerous.

> The user may not even see that smart quotes are being used unless there is
> a breaking error.

But the smart quotes look different than an apostrophe! No confusion is 
possible.
Your argument was correct 40 years ago, when
  1. the shape of the apostrophe was given by US-ASCII and was not vertical [1],
  2. a character cell had so few pixels in height and width that smart quotes
 and apostrophes could not be rendered differently.

The reason why quotearg.c uses smart quotes is explained in [2].

Bruno

[1] https://www.cl.cam.ac.uk/~mgk25/ucs/apostrophe.html
[2] https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html




Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output

2019-12-06 Thread Paul Eggert

On 12/6/19 6:52 PM, Wes Hurd wrote:
What I meant about smart quotes being dangerous was, if copying the 
output text that contains smart quotes to use somewhere else (especially 
in code), the smart quotes have to be manually replaced which is tedious 
for the user (programmer).
The user may not even see that smart quotes are being used unless there 
is a breaking error.


That sort of problem can occur with any quoting style. For example:

$ cp '\0' 'xxx'
cp: cannot stat '\0': No such file or directory

If I cut the '\0' (including the apostrophes) and then paste it into a C 
program, like this:


char *filename = '\0';

the program will compile (it's valid C code!), but it won't do what I 
want. If I paste '\0' into a Python program, I'll get a different string 
(it's a valid Python string too) but it won't equal the filename I gave 
to 'cp'.


Arguably curved quotes would be *safer* than apostrophes here, because 
they would require the programmer to think about quoting when the 
programmer is doing something silly like copying quoted strings blindly 
into a program.


Admittedly this example is contrived, but there's lots of non-contrived 
examples like it.




Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output

2019-12-06 Thread Wes Hurd
What I meant about smart quotes being dangerous was, if copying the output
text that contains smart quotes to use somewhere else (especially in code),
the smart quotes have to be manually replaced which is tedious for the user
(programmer).
The user may not even see that smart quotes are being used unless there is
a breaking error.



On Thu, Dec 5, 2019 at 12:07 PM Tim Rühsen  wrote:

> On 12/5/19 4:12 PM, Wes Hurd wrote:
> > Hi,
> >
> > It seems GNUlib quote encoding goes to Unicode smart quotes, which causes
> > command-line program output to be in smart quotes.
> > Smart quotes are dangerous for programmers and technical users, and
> should
> > be avoided in program output.
> >
> > Originally noticed with wget -
> > https://savannah.gnu.org/bugs/index.php?57356#comment1
> >
> > Can you change it to use only regular typed " quotes , at least with
> STDOUT
> > / STDERR ?
>
> All the GNU tools support localization. And there seems to be some kind
> of inconsistency.
>
>
> $ LANGUAGE=de cp
> cp: Fehlender Dateioperand
> „cp --help“ liefert weitere Informationen.
>
>
> $ LANGUAGE=de wget
> wget: URL fehlt
> Aufruf: wget [OPTION]... [URL] …
>
> »wget --help« gibt weitere Informationen.
>
>
> Which one is correct ?
>
> Maybe someone here can bring light into this or give us a pointer where
> to get clarification.
>
> Wes, maybe you can elaborate why you think that smart quotes are
> *dangerous* ?
>
> Regards, Tim
>
>


Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output

2019-12-06 Thread Florian Weimer
* Tim Rühsen:

> On 12/5/19 4:12 PM, Wes Hurd wrote:
>> Hi,
>> 
>> It seems GNUlib quote encoding goes to Unicode smart quotes, which causes
>> command-line program output to be in smart quotes.
>> Smart quotes are dangerous for programmers and technical users, and should
>> be avoided in program output.
>> 
>> Originally noticed with wget -
>> https://savannah.gnu.org/bugs/index.php?57356#comment1
>> 
>> Can you change it to use only regular typed " quotes , at least with STDOUT
>> / STDERR ?
>
> All the GNU tools support localization. And there seems to be some kind
> of inconsistency.
>
>
> $ LANGUAGE=de cp
> cp: Fehlender Dateioperand
> „cp --help“ liefert weitere Informationen.
>
>
> $ LANGUAGE=de wget
> wget: URL fehlt
> Aufruf: wget [OPTION]... [URL] …
>
> »wget --help« gibt weitere Informationen.
>
>
> Which one is correct ?

Both are, for Germany and Austria.  »« quotes are in Latin-1, which once
was a decisive advantage.  On the other hand, »« used to be restricted
to professional typesetting.

In Switzerland, «» are used instead of »« quotes, if I recall correctly.

Thanks,
Florian




Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output

2019-12-05 Thread Paul Eggert

On 12/5/19 7:12 AM, Wes Hurd wrote:

Smart quotes are dangerous for programmers and technical users


Sure, but *all* quotes are dangerous for programmers and technical 
users. :-)


If you don’t want smart quotes, you can set LC_ALL=C in your environment.



Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output

2019-12-05 Thread Tim Rühsen
On 12/5/19 4:12 PM, Wes Hurd wrote:
> Hi,
> 
> It seems GNUlib quote encoding goes to Unicode smart quotes, which causes
> command-line program output to be in smart quotes.
> Smart quotes are dangerous for programmers and technical users, and should
> be avoided in program output.
> 
> Originally noticed with wget -
> https://savannah.gnu.org/bugs/index.php?57356#comment1
> 
> Can you change it to use only regular typed " quotes , at least with STDOUT
> / STDERR ?

All the GNU tools support localization. And there seems to be some kind
of inconsistency.


$ LANGUAGE=de cp
cp: Fehlender Dateioperand
„cp --help“ liefert weitere Informationen.


$ LANGUAGE=de wget
wget: URL fehlt
Aufruf: wget [OPTION]... [URL] …

»wget --help« gibt weitere Informationen.


Which one is correct ?

Maybe someone here can bring light into this or give us a pointer where
to get clarification.

Wes, maybe you can elaborate why you think that smart quotes are
*dangerous* ?

Regards, Tim



signature.asc
Description: OpenPGP digital signature


GNUlib unicode encoding causes smart quotes to be displayed in program's output

2019-12-05 Thread Wes Hurd
Hi,

It seems GNUlib quote encoding goes to Unicode smart quotes, which causes
command-line program output to be in smart quotes.
Smart quotes are dangerous for programmers and technical users, and should
be avoided in program output.

Originally noticed with wget -
https://savannah.gnu.org/bugs/index.php?57356#comment1

Can you change it to use only regular typed " quotes , at least with STDOUT
/ STDERR ?

Thanks,