Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output
On 12/7/19 5:56 AM, Bruno Haible wrote: > So, we should write > L’Oréal and L’chaim with U+2019 > and OʼHara, OʼConnor with U+02BC. I wouldn’t use U+02BC MODIFIER LETTER APOSTROPHE in the Anglicized form of Irish names. The Irish-language spelling is of course quite different, but even then I wouldn’t use U+02BC; e.g., the Anglicized “O’” corresponds to the Irish “Ó ”. A quick Google search suggests that U+02BC is used for ejective consonants in IPA, and in some languages such as Bodo, but I do not know why U+02BC would be used for Anglicized Irish names.
Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output
On 07/12/2019 04:10, Bruno Haible wrote: Wes Hurd wrote: What I meant about smart quotes being dangerous was, if copying the output text that contains smart quotes to use somewhere else (especially in code), the smart quotes have to be manually replaced which is tedious for the user (programmer). It's quite the opposite: The smart quote characters are safer because they don't interfere with the quoting of a shell or programming language. For example, in my music directory, I have a directory T’pau. $ ls T’pau China in your hand.wav If the directory was named T'pau and I were to write the command $ ls T'pau it would hang. And $ ls 'T'pau' would hang as well. *This* is dangerous. The user may not even see that smart quotes are being used unless there is a breaking error. But the smart quotes look different than an apostrophe! No confusion is possible. Your argument was correct 40 years ago, when 1. the shape of the apostrophe was given by US-ASCII and was not vertical [1], 2. a character cell had so few pixels in height and width that smart quotes and apostrophes could not be rendered differently. The reason why quotearg.c uses smart quotes is explained in [2]. Bruno [1] https://www.cl.cam.ac.uk/~mgk25/ucs/apostrophe.html [2] https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html Agreeing with Bruno here. Note gnulib supports shell quoting, which coreutils uses to safely quote file names: https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v8.24-71-g08e8fd7 Also coreutils ls leverages this to safely output file names by default. Note also re apostrophe, word boundaries may be a consideration. As I previously tweeted: It's awkward for file names to use shell quote It’s awkward for word regex to use right quote (\u2019) Itʼs best to use apostrophe modifier (\u02BC) cheers, Pádraig
Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output
Pádraig Brady wrote: >It’s awkward for word regex to use right quote (\u2019) >Itʼs best to use apostrophe modifier (\u02BC) Unicode.org recommends: * U+2019 is the preferred character for a punctuation apostrophe, * U+02BC is a glottal stop, used by many languages as a letter of their alphabets. So, we should write L’Oréal and L’chaim with U+2019 and OʼHara, OʼConnor with U+02BC. Bruno
Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output
Wes Hurd wrote: > What I meant about smart quotes being dangerous was, if copying the output > text that contains smart quotes to use somewhere else (especially in code), > the smart quotes have to be manually replaced which is tedious for the user > (programmer). It's quite the opposite: The smart quote characters are safer because they don't interfere with the quoting of a shell or programming language. For example, in my music directory, I have a directory T’pau. $ ls T’pau China in your hand.wav If the directory was named T'pau and I were to write the command $ ls T'pau it would hang. And $ ls 'T'pau' would hang as well. *This* is dangerous. > The user may not even see that smart quotes are being used unless there is > a breaking error. But the smart quotes look different than an apostrophe! No confusion is possible. Your argument was correct 40 years ago, when 1. the shape of the apostrophe was given by US-ASCII and was not vertical [1], 2. a character cell had so few pixels in height and width that smart quotes and apostrophes could not be rendered differently. The reason why quotearg.c uses smart quotes is explained in [2]. Bruno [1] https://www.cl.cam.ac.uk/~mgk25/ucs/apostrophe.html [2] https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output
On 12/6/19 6:52 PM, Wes Hurd wrote: What I meant about smart quotes being dangerous was, if copying the output text that contains smart quotes to use somewhere else (especially in code), the smart quotes have to be manually replaced which is tedious for the user (programmer). The user may not even see that smart quotes are being used unless there is a breaking error. That sort of problem can occur with any quoting style. For example: $ cp '\0' 'xxx' cp: cannot stat '\0': No such file or directory If I cut the '\0' (including the apostrophes) and then paste it into a C program, like this: char *filename = '\0'; the program will compile (it's valid C code!), but it won't do what I want. If I paste '\0' into a Python program, I'll get a different string (it's a valid Python string too) but it won't equal the filename I gave to 'cp'. Arguably curved quotes would be *safer* than apostrophes here, because they would require the programmer to think about quoting when the programmer is doing something silly like copying quoted strings blindly into a program. Admittedly this example is contrived, but there's lots of non-contrived examples like it.
Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output
What I meant about smart quotes being dangerous was, if copying the output text that contains smart quotes to use somewhere else (especially in code), the smart quotes have to be manually replaced which is tedious for the user (programmer). The user may not even see that smart quotes are being used unless there is a breaking error. On Thu, Dec 5, 2019 at 12:07 PM Tim Rühsen wrote: > On 12/5/19 4:12 PM, Wes Hurd wrote: > > Hi, > > > > It seems GNUlib quote encoding goes to Unicode smart quotes, which causes > > command-line program output to be in smart quotes. > > Smart quotes are dangerous for programmers and technical users, and > should > > be avoided in program output. > > > > Originally noticed with wget - > > https://savannah.gnu.org/bugs/index.php?57356#comment1 > > > > Can you change it to use only regular typed " quotes , at least with > STDOUT > > / STDERR ? > > All the GNU tools support localization. And there seems to be some kind > of inconsistency. > > > $ LANGUAGE=de cp > cp: Fehlender Dateioperand > „cp --help“ liefert weitere Informationen. > > > $ LANGUAGE=de wget > wget: URL fehlt > Aufruf: wget [OPTION]... [URL] … > > »wget --help« gibt weitere Informationen. > > > Which one is correct ? > > Maybe someone here can bring light into this or give us a pointer where > to get clarification. > > Wes, maybe you can elaborate why you think that smart quotes are > *dangerous* ? > > Regards, Tim > >
Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output
* Tim Rühsen: > On 12/5/19 4:12 PM, Wes Hurd wrote: >> Hi, >> >> It seems GNUlib quote encoding goes to Unicode smart quotes, which causes >> command-line program output to be in smart quotes. >> Smart quotes are dangerous for programmers and technical users, and should >> be avoided in program output. >> >> Originally noticed with wget - >> https://savannah.gnu.org/bugs/index.php?57356#comment1 >> >> Can you change it to use only regular typed " quotes , at least with STDOUT >> / STDERR ? > > All the GNU tools support localization. And there seems to be some kind > of inconsistency. > > > $ LANGUAGE=de cp > cp: Fehlender Dateioperand > „cp --help“ liefert weitere Informationen. > > > $ LANGUAGE=de wget > wget: URL fehlt > Aufruf: wget [OPTION]... [URL] … > > »wget --help« gibt weitere Informationen. > > > Which one is correct ? Both are, for Germany and Austria. »« quotes are in Latin-1, which once was a decisive advantage. On the other hand, »« used to be restricted to professional typesetting. In Switzerland, «» are used instead of »« quotes, if I recall correctly. Thanks, Florian
Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output
On 12/5/19 7:12 AM, Wes Hurd wrote: Smart quotes are dangerous for programmers and technical users Sure, but *all* quotes are dangerous for programmers and technical users. :-) If you don’t want smart quotes, you can set LC_ALL=C in your environment.
Re: GNUlib unicode encoding causes smart quotes to be displayed in program's output
On 12/5/19 4:12 PM, Wes Hurd wrote: > Hi, > > It seems GNUlib quote encoding goes to Unicode smart quotes, which causes > command-line program output to be in smart quotes. > Smart quotes are dangerous for programmers and technical users, and should > be avoided in program output. > > Originally noticed with wget - > https://savannah.gnu.org/bugs/index.php?57356#comment1 > > Can you change it to use only regular typed " quotes , at least with STDOUT > / STDERR ? All the GNU tools support localization. And there seems to be some kind of inconsistency. $ LANGUAGE=de cp cp: Fehlender Dateioperand „cp --help“ liefert weitere Informationen. $ LANGUAGE=de wget wget: URL fehlt Aufruf: wget [OPTION]... [URL] … »wget --help« gibt weitere Informationen. Which one is correct ? Maybe someone here can bring light into this or give us a pointer where to get clarification. Wes, maybe you can elaborate why you think that smart quotes are *dangerous* ? Regards, Tim signature.asc Description: OpenPGP digital signature
GNUlib unicode encoding causes smart quotes to be displayed in program's output
Hi, It seems GNUlib quote encoding goes to Unicode smart quotes, which causes command-line program output to be in smart quotes. Smart quotes are dangerous for programmers and technical users, and should be avoided in program output. Originally noticed with wget - https://savannah.gnu.org/bugs/index.php?57356#comment1 Can you change it to use only regular typed " quotes , at least with STDOUT / STDERR ? Thanks,