Re: [bug-gnulib] quote characters in stds
The main point is that it transmits the perception that Now I understand. Thanks. These two paragraphs seem out of place: I had been thinking of that as referring only to quotation characters, but I see that you are right. Not sure what rms will think, but it does seem cleaner to have two separate section, so let's try that. Trying to take both your latest comments into account, now I have the following ... @node Character set @section Character set @cindex character set @cindex encodings @cindex ASCII characters @cindex non-ASCII characters Sticking to the ASCII character set (plain text, 7-bit characters) is preferred in GNU source code comments, text documents, and other contexts, unless there is good reason to do something else because of the domain at hand. If you need to use non-ASCII characters, for example to represent names of contributors, you should normally stick with one encoding, as one cannot in general mix encodings reliably. @node Quote characters @section Quote characters @cindex quote characters In the C locale, GNU programs should stick to plain ASCII for quotation characters in messages to users: preferably 0x60 (`) for left quotes and 0x27 (') for right quotes. If using ` is unacceptable in your application, other possibilities are using ' for both opening and closing, or 0x22 () for both opening and closing. It is ok, but not required, to use locale-specific quotes in other locales. The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote} and @code{quotearg} modules provide a reasonably straightforward way to support locale-specific quote characters, as well as taking care of other issues, such as quoting a filename that itself contains a quote character. See the Gnulib documentation for usage details. In any case, the documentation for your program should clearly specify how it does quoting, if different than the preferred method of ` and '. This is especially important if the output of your program is ever likely to be parsed by another program. Quotation characters are a difficult area in the computing world at this time: there are no true left or right quote characters in ASCII, or even Latin1; the ` character we use was standardized as a grave accent. Latin1 does have paired standalone accents, but it seems wrong in principle to abuse them as quotes. Also, Latin1 is still not universally usable. Unicode contains the unambiguous quote characters required, and its common encoding UTF-8 is upward compatible with [EMAIL PROTECTED] However, Unicode and UTF-8 are not universally well-supported, either. This may change over the next few years, and then we will revisit this. ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: [bug-gnulib] quote characters in stds
Bruno Haible [EMAIL PROTECTED] writes: Five years ago, people made up lists of programs that _do_ work with UTF-8 encoded text files. Today, these programs are uncountable. Instead, people make up lists of programs that _don't_ work with Unicode: http://www.freedesktop.org/wiki/Software_2fBadSoftware Well, to be fair, lots of programs work OK with UTF-8 in common cases, but mess up when they're thown something hard. Certainly the list you referred to is woefully incomplete. I read your email containing accented letters with GNU Emacs 21.4 and Gnus 5.10.6, a combination that supports UTF-8. But I didn't see the accented letters correctly on my screen: I saw ? instead. This is because I was logged in via an ssh xterm window from Debian woody, whose xterm doesn't support UTF-8. Now that Debian sarge is stable I will look into switching to a better xterm, but this will take some of my time (I'm not looking forward to upgrading all my machines to sarge) and even then I'm not sure things will work (the last time I tried uxterm it flaked out on me too often for my comfort). It's unlikely we'll change RMS's opinion on UTF-8 right now, but I think we could tone down the language a bit without too much trouble. How about if we change deployed even less widely than Latin1 (which is true some places but not others, at least in my experience) to still not universally well-supported (which is the point, after all)? E.g., change this: Unicode contains the unambiguous quote characters required, and its common encoding UTF-8 is upward compatible with [EMAIL PROTECTED] But Unicode and UTF-8 are deployed even less widely than Latin1; it would be premature to require Unicode support for running essentially every GNU program. to this: Unicode contains the unambiguous quote characters required, and its common encoding UTF-8 is upward compatible with [EMAIL PROTECTED] But Unicode and UTF-8 are still not universally well-supported; it would be premature to require Unicode support for running essentially every GNU program. One other comment. These two paragraphs seem out of place: ASCII should also be preferred in source code comments, text documents, and other contexts, unless there is good reason to do something else because of the domain at hand. If you need to use non-ASCII characters, for example to represent names of contributors, you should normally stick with one encoding, as one cannot in general mix encodings reliably. How about if we create a new section Non-ASCII characters and put it before this new Quote characters section? That might make the organization clearer. ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: [bug-gnulib] quote characters in stds
Karl Berry wrote: @node Quote characters @section Quote characters @cindex quote characters In the C locale, GNU programs should stick to plain ASCII for quotation characters in messages to users: either 0x60 (`) for left quotes and 0x27 (') for right quotes, or ' for both opening and closing, or (0x22) for both opening and closing. It is ok, but not required, to use locale-specific quotes in other locales. The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote} and @code{quotearg} modules provide a reasonably straightforward way support locale-specific quote characters, as well as taking care of other issues, such as quoting a filename that itself contains a quote character. See the Gnulib documentation for usage details. ASCII should also be preferred in source code comments, text documents, and other contexts, unless there is good reason to do something else because of the domain at hand. Agreed. If you need to use non-ASCII characters, for example to represent names of contributors, you should normally stick with one encoding, as one cannot in general mix encodings reliably. [EMAIL PROTECTED] is the most widely usable encoding today, after plain [EMAIL PROTECTED] This is misleading. In a list of contributors, I often find names like Rafa Maszkowski, Primo Peterlin, Martin Mokrej, and (Vladimir Slepnev). To represent them, you need Unicode, i.e. the UTF-8 encoding. Quotation characters are a difficult area in the computing world at this time: there are no true left or right quote characters in ASCII, or even [EMAIL PROTECTED] [EMAIL PROTECTED] does have paired standalone accents, but it seems wrong in principle to abuse them as quotes. And even [EMAIL PROTECTED] is not universally usable. Unicode contains the unambiguous quote characters required, and its common encoding [EMAIL PROTECTED] is upward compatible with [EMAIL PROTECTED] Agreed. But Unicode and UTF-8 are deployed less widely than [EMAIL PROTECTED]; it would be premature to require Unicode support for running essentially every GNU program. This is not true for several years now. The major GUI toolkits, KDE/Qt and GNOME/Gtk, support Unicode for several years, and are now featuring good support not only of Western and CJK languages, but also of Bidi scripts and Indic languages. 'vi' is UTF-8 enabled since 2001. For more than one year, major Linux distributions like Fedora Core 3 put users into UTF-8 locales by default. See http://www.cl.cam.ac.uk/~mgk25/unicode.html for more info. Bruno PS: The right spelling of the encodings is Latin1 (no dash, no space) and UTF-8 (with a HYPHEN-MINUS in between). ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: [bug-gnulib] quote characters in stds
This is misleading. I know, but I'm not sure what to say. Just delete the sentence about Latin1, maybe? I guess it's not really necessary. To represent them, you need Unicode, i.e. the UTF-8 encoding. Yes, but rms has explicitly rejected (in previous email with me) the idea of recommending the use of UTF-8 in any context whatsoever. Sigh. This is not true for several years now. Well, whether or not it is true, rms will not accept it, so there's no sense arguing it here. My personal experience is that it is true that Unicode is still considerably less widely usable than Latin1. Sure, Unicode is available in many contexts and systems. But the names in your message, just for example, came through as garbage to me. No doubt I personally could eventually configure everything involved to display it properly, but the point is that it doesn't just work. And I suspect I am using far newer versions of everything than an average user. PS: The right spelling of the encodings is Latin1 (no dash, no space) I'm glad to know that, it's easier to type than @tie{} :). I had mostly seen it with a space. Do you happen to know where the definitive spelling is given? I've poked around the ISO site without success. Another draft below. I'm not quite sure why ` would ever be unacceptable, and I'm a bit skeptical that it will past muster with rms, but I'm trying to avoid an argument with standards-mavens. And gcc 4 already does '...'. Any improved wording and/or backup facts welcome :). Thanks, k @node Quote characters @section Quote characters @cindex quote characters In the C locale, GNU programs should stick to plain ASCII for quotation characters in messages to users: preferably 0x60 (`) for left quotes and 0x27 (') for right quotes. If using ` is unacceptable in your application, other possibilities are using ' for both opening and closing, or (0x22) for both opening and closing. It is ok, but not required, to use locale-specific quotes in other locales. The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote} and @code{quotearg} modules provide a reasonably straightforward way support locale-specific quote characters, as well as taking care of other issues, such as quoting a filename that itself contains a quote character. See the Gnulib documentation for usage details. In any case, the documentation for your program should clearly specify how it does quoting, if different than the preferred method of ` and '. This is especially important if the output of your program is ever likely to be parsed by another program. ASCII should also be preferred in source code comments, text documents, and other contexts, unless there is good reason to do something else because of the domain at hand. If you need to use non-ASCII characters, for example to represent names of contributors, you should normally stick with one encoding, as one cannot in general mix encodings reliably. Quotation characters are a difficult area in the computing world at this time: there are no true left or right quote characters in ASCII, or even Latin1 (the ` character we use is standardized as a grave accent). Latin1 does have paired standalone accents, but it seems wrong in principle to abuse them as quotes. And even Latin1 is not universally usable. Unicode contains the unambiguous quote characters required, and its common encoding UTF-8 is upward compatible with [EMAIL PROTECTED] But Unicode and UTF-8 are deployed even less widely than Latin1; it would be premature to require Unicode support for running essentially every GNU program. Perhaps the prevailing situation will change in a few years, and then we will revisit this. ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: [bug-gnulib] quote characters in stds
Karl Berry wrote: Yes, but rms has explicitly rejected (in previous email with me) the idea of recommending the use of UTF-8 in any context whatsoever. Sigh. Sigh. What you wrote there: If you need to use non-ASCII characters, for example to represent names of contributors, you should normally stick with one encoding, as one cannot in general mix encodings reliably. is a salomonic solution: to educated people it recommends Unicode, without mentioning it explicitly. My personal experience is that it is true that Unicode is still considerably less widely usable than Latin1. Sure, Unicode is available in many contexts and systems. But the names in your message, just for example, came through as garbage to me. That depends on your mailer. Is it a package in Emacs, or is it 'pine' without Bernhard Kaindl's patches? No doubt I personally could eventually configure everything involved to display it properly, but the point is that it doesn't just work. True: there are some distributions where things don't just work, but these non-Unicode-enabled corners are diminishing. Maybe you can reformulate the last two paragraphs in a way that is less incorrect? PS: The right spelling of the encodings is Latin1 (no dash, no space) I'm glad to know that, it's easier to type than @tie{} :). I had mostly seen it with a space. Do you happen to know where the definitive spelling is given? It's at the IANA: http://www.iana.org/assignments/character-sets Bruno ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib
Re: [bug-gnulib] quote characters in stds
to educated people it recommends Unicode, without mentioning it explicitly. True. I do not know how else to write it. (I'm also not sure rms will go for it at all.) That depends on your mailer. Is it a package in Emacs, or is it 'pine' without Bernhard Kaindl's patches? My personal configuration is not the point (it's vm inside emacs). My point is that it didn't come through correctly. I am sure I am not unique in this. Maybe you can reformulate the last two paragraphs in a way that is less incorrect? Sorry, since I do not see what is incorrect about them, I do not know how to reformulate them. If you can suggest wording that makes you happier, please do. It's at the IANA: http://www.iana.org/assignments/character-sets Thanks. ___ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib