Re: [bug-gnulib] quote characters in stds

2005-06-09 Thread Karl Berry
The main point is that it transmits the perception that 

Now I understand.  Thanks.

These two paragraphs seem out of place:

I had been thinking of that as referring only to quotation characters,
but I see that you are right.  Not sure what rms will think, but it does
seem cleaner to have two separate section, so let's try that.

Trying to take both your latest comments into account, now I have the
following ...


@node Character set
@section Character set
@cindex character set
@cindex encodings
@cindex ASCII characters
@cindex non-ASCII characters

Sticking to the ASCII character set (plain text, 7-bit characters) is
preferred in GNU source code comments, text documents, and other
contexts, unless there is good reason to do something else because of
the domain at hand.

If you need to use non-ASCII characters, for example to represent
names of contributors, you should normally stick with one encoding, as
one cannot in general mix encodings reliably.  


@node Quote characters
@section Quote characters
@cindex quote characters

In the C locale, GNU programs should stick to plain ASCII for
quotation characters in messages to users: preferably 0x60 (`) for
left quotes and 0x27 (') for right quotes.  If using ` is unacceptable
in your application, other possibilities are using ' for both opening
and closing, or 0x22 () for both opening and closing.  It is ok, but
not required, to use locale-specific quotes in other locales.

The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote}
and @code{quotearg} modules provide a reasonably straightforward way
to support locale-specific quote characters, as well as taking care of
other issues, such as quoting a filename that itself contains a quote
character.  See the Gnulib documentation for usage details.

In any case, the documentation for your program should clearly specify
how it does quoting, if different than the preferred method of ` and
'.  This is especially important if the output of your program is ever
likely to be parsed by another program.

Quotation characters are a difficult area in the computing world at
this time: there are no true left or right quote characters in ASCII,
or even Latin1; the ` character we use was standardized as a grave
accent.  Latin1 does have paired standalone accents, but it seems
wrong in principle to abuse them as quotes.  Also, Latin1 is still not
universally usable.

Unicode contains the unambiguous quote characters required, and its
common encoding UTF-8 is upward compatible with [EMAIL PROTECTED]  However,
Unicode and UTF-8 are not universally well-supported, either. 

This may change over the next few years, and then we will revisit
this.




___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib


Re: [bug-gnulib] quote characters in stds

2005-06-08 Thread Paul Eggert
Bruno Haible [EMAIL PROTECTED] writes:

 Five years ago, people made up lists of programs that _do_ work with UTF-8
 encoded text files. Today, these programs are uncountable. Instead, people
 make up lists of programs that _don't_ work with Unicode:
   http://www.freedesktop.org/wiki/Software_2fBadSoftware

Well, to be fair, lots of programs work OK with UTF-8 in common cases,
but mess up when they're thown something hard.  Certainly the list
you referred to is woefully incomplete.

I read your email containing accented letters with GNU Emacs 21.4 and
Gnus 5.10.6, a combination that supports UTF-8.  But I didn't see the
accented letters correctly on my screen: I saw ? instead.  This is
because I was logged in via an ssh xterm window from Debian woody,
whose xterm doesn't support UTF-8.  Now that Debian sarge is stable I
will look into switching to a better xterm, but this will take some of
my time (I'm not looking forward to upgrading all my machines to
sarge) and even then I'm not sure things will work (the last time
I tried uxterm it flaked out on me too often for my comfort).

It's unlikely we'll change RMS's opinion on UTF-8 right now, but I
think we could tone down the language a bit without too much trouble.
How about if we change deployed even less widely than Latin1 (which
is true some places but not others, at least in my experience) to
still not universally well-supported (which is the point, after
all)?  E.g., change this:

  Unicode contains the unambiguous quote characters required, and its
  common encoding UTF-8 is upward compatible with [EMAIL PROTECTED]  But Unicode
  and UTF-8 are deployed even less widely than Latin1; it would be
  premature to require Unicode support for running essentially every GNU
  program.

to this:

  Unicode contains the unambiguous quote characters required, and its
  common encoding UTF-8 is upward compatible with [EMAIL PROTECTED]  But Unicode
  and UTF-8 are still not universally well-supported; it would be
  premature to require Unicode support for running essentially every GNU
  program.


One other comment.  These two paragraphs seem out of place:

  ASCII should also be preferred in source code comments, text
  documents, and other contexts, unless there is good reason to do
  something else because of the domain at hand.

  If you need to use non-ASCII characters, for example to represent
  names of contributors, you should normally stick with one encoding, as
  one cannot in general mix encodings reliably.

How about if we create a new section Non-ASCII characters and put it
before this new Quote characters section?  That might make the
organization clearer.


___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib


Re: [bug-gnulib] quote characters in stds

2005-06-07 Thread Bruno Haible
Karl Berry wrote:

 @node Quote characters
 @section Quote characters
 @cindex quote characters

 In the C locale, GNU programs should stick to plain ASCII for
 quotation characters in messages to users: either 0x60 (`) for left
 quotes and 0x27 (') for right quotes, or ' for both opening and
 closing, or  (0x22) for both opening and closing.  It is ok, but not
 required, to use locale-specific quotes in other locales.

 The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote}
 and @code{quotearg} modules provide a reasonably straightforward way
 support locale-specific quote characters, as well as taking care of
 other issues, such as quoting a filename that itself contains a quote
 character.  See the Gnulib documentation for usage details.

 ASCII should also be preferred in source code comments, text
 documents, and other contexts, unless there is good reason to do
 something else because of the domain at hand.

Agreed.

 If you need to use non-ASCII characters, for example to represent
 names of contributors, you should normally stick with one encoding, as
 one cannot in general mix encodings reliably.  [EMAIL PROTECTED] is the
 most widely usable encoding today, after plain [EMAIL PROTECTED]

This is misleading. In a list of contributors, I often find names like
Rafa Maszkowski, Primo Peterlin, Martin Mokrej, and  

(Vladimir Slepnev). To represent them, you need Unicode, i.e. the
UTF-8 encoding.

 Quotation characters are a difficult area in the computing world at
 this time: there are no true left or right quote characters in ASCII,
 or even [EMAIL PROTECTED]  [EMAIL PROTECTED] does have paired standalone
 accents, but it seems wrong in principle to abuse them as quotes.  And
 even [EMAIL PROTECTED] is not universally usable.

 Unicode contains the unambiguous quote characters required, and its
 common encoding [EMAIL PROTECTED] is upward compatible with [EMAIL PROTECTED]

Agreed.

 But Unicode and UTF-8 are deployed less widely than [EMAIL PROTECTED]; it 
 would
 be premature to require Unicode support for running essentially every
 GNU program.

This is not true for several years now. The major GUI toolkits, KDE/Qt
and GNOME/Gtk, support Unicode for several years, and are now featuring
good support not only of Western and CJK languages, but also of Bidi
scripts and Indic languages. 'vi' is UTF-8 enabled since 2001. For more
than one year, major Linux distributions like Fedora Core 3 put users
into UTF-8 locales by default.
See http://www.cl.cam.ac.uk/~mgk25/unicode.html for more info.

Bruno

PS: The right spelling of the encodings is Latin1 (no dash, no space)
and UTF-8 (with a HYPHEN-MINUS in between).



___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib


Re: [bug-gnulib] quote characters in stds

2005-06-07 Thread Karl Berry
This is misleading.

I know, but I'm not sure what to say.  Just delete the sentence about
Latin1, maybe?  I guess it's not really necessary.

To represent them, you need Unicode, i.e. the UTF-8 encoding.

Yes, but rms has explicitly rejected (in previous email with me) the
idea of recommending the use of UTF-8 in any context whatsoever.  Sigh.

This is not true for several years now. 

Well, whether or not it is true, rms will not accept it, so there's no
sense arguing it here.

My personal experience is that it is true that Unicode is still
considerably less widely usable than Latin1.  Sure, Unicode is available
in many contexts and systems.  But the names in your message, just for
example, came through as garbage to me.  No doubt I personally could
eventually configure everything involved to display it properly, but the
point is that it doesn't just work.  And I suspect I am using far
newer versions of everything than an average user.

PS: The right spelling of the encodings is Latin1 (no dash, no space)

I'm glad to know that, it's easier to type than @tie{} :).  I had mostly
seen it with a space.  Do you happen to know where the definitive
spelling is given?  I've poked around the ISO site without success.


Another draft below.  I'm not quite sure why ` would ever be
unacceptable, and I'm a bit skeptical that it will past muster with
rms, but I'm trying to avoid an argument with standards-mavens.  And gcc
4 already does '...'.  Any improved wording and/or backup facts welcome :).

Thanks,
k


@node Quote characters
@section Quote characters
@cindex quote characters

In the C locale, GNU programs should stick to plain ASCII for
quotation characters in messages to users: preferably 0x60 (`) for
left quotes and 0x27 (') for right quotes.  If using ` is unacceptable
in your application, other possibilities are using ' for both opening
and closing, or  (0x22) for both opening and closing.  It is ok, but
not required, to use locale-specific quotes in other locales.

The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote}
and @code{quotearg} modules provide a reasonably straightforward way
support locale-specific quote characters, as well as taking care of
other issues, such as quoting a filename that itself contains a quote
character.  See the Gnulib documentation for usage details.

In any case, the documentation for your program should clearly specify
how it does quoting, if different than the preferred method of ` and
'.  This is especially important if the output of your program is ever
likely to be parsed by another program.

ASCII should also be preferred in source code comments, text
documents, and other contexts, unless there is good reason to do
something else because of the domain at hand.

If you need to use non-ASCII characters, for example to represent
names of contributors, you should normally stick with one encoding, as
one cannot in general mix encodings reliably.  

Quotation characters are a difficult area in the computing world at this
time: there are no true left or right quote characters in ASCII, or even
Latin1 (the ` character we use is standardized as a grave accent).
Latin1 does have paired standalone accents, but it seems wrong in
principle to abuse them as quotes.  And even Latin1 is not universally
usable.

Unicode contains the unambiguous quote characters required, and its
common encoding UTF-8 is upward compatible with [EMAIL PROTECTED]  But Unicode
and UTF-8 are deployed even less widely than Latin1; it would be
premature to require Unicode support for running essentially every GNU
program.

Perhaps the prevailing situation will change in a few years, and then
we will revisit this.


___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib


Re: [bug-gnulib] quote characters in stds

2005-06-07 Thread Bruno Haible
Karl Berry wrote:
 Yes, but rms has explicitly rejected (in previous email with me) the
 idea of recommending the use of UTF-8 in any context whatsoever.  Sigh.

Sigh. What you wrote there:

   If you need to use non-ASCII characters, for example to represent
   names of contributors, you should normally stick with one encoding, as
   one cannot in general mix encodings reliably.  

is a salomonic solution: to educated people it recommends Unicode, without
mentioning it explicitly.

 My personal experience is that it is true that Unicode is still
 considerably less widely usable than Latin1.  Sure, Unicode is available
 in many contexts and systems.  But the names in your message, just for
 example, came through as garbage to me.

That depends on your mailer. Is it a package in Emacs, or is it 'pine'
without Bernhard Kaindl's patches?

 No doubt I personally could
 eventually configure everything involved to display it properly, but the
 point is that it doesn't just work.

True: there are some distributions where things don't just work, but these
non-Unicode-enabled corners are diminishing.

Maybe you can reformulate the last two paragraphs in a way that is less
incorrect?

 PS: The right spelling of the encodings is Latin1 (no dash, no space)

 I'm glad to know that, it's easier to type than @tie{} :).  I had mostly
 seen it with a space.  Do you happen to know where the definitive
 spelling is given?

It's at the IANA: http://www.iana.org/assignments/character-sets

Bruno



___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib


Re: [bug-gnulib] quote characters in stds

2005-06-07 Thread Karl Berry
to educated people it recommends Unicode, without mentioning it explicitly.

True.  I do not know how else to write it.  (I'm also not sure rms will
go for it at all.)

That depends on your mailer. Is it a package in Emacs, or is it 'pine'
without Bernhard Kaindl's patches?

My personal configuration is not the point (it's vm inside emacs).  My
point is that it didn't come through correctly.  I am sure I am not
unique in this.

Maybe you can reformulate the last two paragraphs in a way that is
less incorrect?

Sorry, since I do not see what is incorrect about them, I do not know
how to reformulate them.  If you can suggest wording that makes you
happier, please do.

It's at the IANA: http://www.iana.org/assignments/character-sets

Thanks.


___
bug-gnulib mailing list
bug-gnulib@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnulib