Re: Switching to UTF-8

2002-05-01 Thread Florian Weimer
Markus Kuhn [EMAIL PROTECTED] writes: c) Emacs - Current Emacs UTF-8 support is still a bit too provisional for my comfort. In particular, I don't like that the UTF-8 mode is not binary transparent. Work on turning Emcas completely into a UTF-8 editor is under way, and I'd

Re: Please do not use en_US.UTF-8 outside the US

2002-05-01 Thread Florian Weimer
Markus Kuhn [EMAIL PROTECTED] writes: As we are talking about en_US.UTF-8: General warning: Please do not use the locale name en_US.UTF-8 anywhere outside North America. Why can't you use it for LC_CTYPE and LC_MESSAGES, say? Determining paper size by locale is rather strange. What's

Re: POSIX:2001 now available online

2002-02-13 Thread Florian Weimer
Markus Kuhn [EMAIL PROTECTED] writes: The revised POSIX standard, which has been merged with the Single UNIX Specification is now available online in HTML! It is complicated to look up sections by their number. Or am I missing something? -- Linux-UTF8: i18n of Linux on all levels Archive:

Re: [I18n]Re: Li18nux Locale Name Guideline Public Review

2002-01-22 Thread Florian Weimer
Bram Moolenaar [EMAIL PROTECTED] writes: Ignoring case does not appear to lead to compatibility problems. It does. Case is used to separate public and private namespace (probably a design mistake). However, we shuld ignore case in the charset: we are going to use mainly MIME charset names (at

Re: Free availability of ISO/IEC standards

2002-01-04 Thread Florian Weimer
Keld Jørn Simonsen [EMAIL PROTECTED] writes: Can't you get access to them in the onsite department of the library? (That is, the department where you cannot loan the books, but only read them onsite). No, definitely not. The librarians don't even know how to get those standards (ISO and

Re: unicode in emacs 21

2001-11-04 Thread Florian Weimer
Eli Zaretskii [EMAIL PROTECTED] writes: The GNU Emacs/Unicode proposal I've seen seems to have this property, too. (At least the proposal is ambiguous, and one interpretation is that you can encode a single character in multiple ways.) Unless you refer to the CNS plane and Japanese Han

Re: Unicode in Emacs again

2001-11-04 Thread Florian Weimer
Kenichi Handa [EMAIL PROTECTED] writes: Florian Weimer [EMAIL PROTECTED] writes: What does 'via surrogate pair' mean? I guess the second line should read: 00 Unicode 20bit (U+1 - U+F) Yes. That's correct, and the third line shoud read as below

A more verbose version of Emacs-Unicode-990824

2001-11-04 Thread Florian Weimer
. It reflects the discussion on the `emacs-unicode' mailing list and the `Emacs-Unicode-990824' proposal. Version $Revision: 1.1 $, written by Florian Weimer. Requirements The internal character code of a character has to fit in 22 bits. (The remaining bits of a 32 bit host

Re: unicode in emacs 21

2001-10-30 Thread Florian Weimer
Richard Stallman [EMAIL PROTECTED] writes: Supporting Unicode superficially while retaining the current internal representation raises a number of problems, one of them being that the internal representation has several alternatives for the same character which correspond to the same code

Re: unicode in emacs 21

2001-10-28 Thread Florian Weimer
H. Peter Anvin [EMAIL PROTECTED] writes: Does that mean you're painting yourself into a corner, though, requiring manual work to integrate the increasingly Unicode-based infrastructure support that is becoming available? Odds are pretty good that they are. I don't think it is a good idea

Re: unicode in emacs 21

2001-10-28 Thread Florian Weimer
Eli Zaretskii [EMAIL PROTECTED] writes: Why can't you continue to use the MULE code and just change the character sets to reflect certain aspects of Unicode? The current plan for Unicode was discussed at length 3 years ago, and the result was what I described. Is the discussion archived

Re: unicode in emacs 21

2001-10-27 Thread Florian Weimer
Eli Zaretskii [EMAIL PROTECTED] writes: Emacs cannot use a pure UTF-8 encoding, since some cultures don't want unification, and it was decided that Emacs should not force unification on those cultures. Why can't you continue to use the MULE code and just change the character sets to reflect

Re: UTF16 and GCC

2001-08-08 Thread Florian Weimer
[EMAIL PROTECTED] (Kai Henningsen) writes: * Do we need a native wide char encoding, too (mostly for Win32 where it's UTF-16, but possibly also some Asian thing)? A single 'char' encoded in UTF-16? This sounds horrible. I can't quite parse that. If you've got a 16 bit wchar_t, there's

Re: UTF16 and GCC

2001-08-05 Thread Florian Weimer
[EMAIL PROTECTED] (Kai Henningsen) writes: * Do we need a native wide char encoding, too (mostly for Win32 where it's UTF-16, but possibly also some Asian thing)? A single 'char' encoded in UTF-16? This sounds horrible. - Linux-UTF8: i18n of Linux on all levels Archive:

Re: Word and Antiword

2001-07-14 Thread Florian Weimer
Markus Kuhn [EMAIL PROTECTED] writes: Antiword is available from http://www.winfield.demon.nl/ and provides significantly better DOC - plaintext conversion than any Micorsoft product. Unfortunately, this is not true. It fails badly on Word documents with embedded change

Re: file name encoding

2001-06-27 Thread Florian Weimer
Bruno Haible [EMAIL PROTECTED] writes: The programs we are waiting for are: - emacs. In an UTF-8 locale, it does not set the keyboard-coding-system to UTF-8, thus when I type umlaut keys strange things happen. And it does not set the default file encoding to UTF-8,

Re: Set Character Width Proposal (Version 3)

2001-06-24 Thread Florian Weimer
Markus Kuhn [EMAIL PROTECTED] writes: Here is another iteration of the SCW control function definition, to allow users of terminal emulators full control over whether single-width or double-width glyphs will be used: Why don't you use the Unicode tagging mechanism (or some special

Re: wchar_t -- Unicode Conversion

2001-06-02 Thread Florian Weimer
Michael B. Allen [EMAIL PROTECTED] writes: Why doesn't wchar_t play nice with Unicode? It does, if your C implementation defines the macro name __STDC_ISO_10646__ (see the C standard for additional information). - Linux-UTF8: i18n of Linux on all levels Archive:

UTF-8 in RFC 2279 and ISO 10646

2001-05-01 Thread Florian Weimer
Sorry for this question which is slightly off topic: Are the UTF-8 definitions in ISO/IEC 10646-1:200 and RFC 2279 identical or equivalent? Can any harm result if a nomative document refers to both definitions (this is a bad idea if the definitions are slightly different). And BTW: Does ISO

Re: REVERSE SOLIDUS in JIS0208.TXT

2001-04-15 Thread Florian Weimer
Markus Kuhn [EMAIL PROTECTED] writes: Note that we have the exact same problem with various European/American encodings such as CP437, where IBM and Microsoft came up with radically different and incompatible mappings If I'm not mistaken, at least one character in CP437 has even been

Re: Unicode is optimal for Chinese/Japanese multilingual texts

2001-04-14 Thread Florian Weimer
"H. Peter Anvin" [EMAIL PROTECTED] writes: The Chinese Academy Of Sciences has published a set of scalable fonts in several styles, but unfortunately in a proprietary format with closed-source converters to PK format for usage with TeX. Is there any descriptions of this format? I

Re: Unicode is optimal for Chinese/Japanese multilingual texts

2001-04-11 Thread Florian Weimer
Tomohiro KUBOTA [EMAIL PROTECTED] writes: I don't know about Chinese and Korean font projects. The Chinese Academy Of Sciences has published a set of scalable fonts in several styles, but unfortunately in a proprietary format with closed-source converters to PK format for usage with

Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-11 Thread Florian Weimer
Markus Kuhn [EMAIL PROTECTED] writes: The only characters for which double-width (square) is appropriate are - Han ideographs - Hiragana/Katakana - Hangul - CJK punctuation - fullwidth forms There are a few other characters which simply can't be displayed properly

Re: Doublewidth block graphics for unhappy MS-DOS users

2001-04-11 Thread Florian Weimer
Markus Kuhn [EMAIL PROTECTED] writes: CP437/CP850 is still used today in the MS-DOS box on *every* Windows98 machine in West Europe/US/etc. These codepages are also used on IBM operating systems such as OS/2 and AIX, I guess. - Linux-UTF8: i18n of Linux on all levels Archive:

Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-11 Thread Florian Weimer
Martin Norbck [EMAIL PROTECTED] writes: I think this is a simple issue of counting the vertical lines in the glyph. I think that's to coarse. There might be some cases in which existing monospace fonts treat characters as single-width because systems with 9x16 or 8x8 glyph cells are

Re: multilingual man pages

2001-04-11 Thread Florian Weimer
Bruno Haible [EMAIL PROTECTED] writes: Wouldn't it be better to use standard names in all cases, and use a simple Emacs lisp function to convert the standard name to an Emacs name? The Emacs PO mode already has code for this. I think Gnus implements a different, but similar

Re: Doublewidth EM DASH for unhappy English people

2001-04-11 Thread Florian Weimer
Markus Kuhn [EMAIL PROTECTED] writes: I see actually no big problem to make all the circled and parenthesised numbers and letters doublewidth in the standard wcwidth, or even the EM DASH. It would just mean that the definition of wcwidth becomes an actual design issue, and not just

Re: TCL/Tk and ISO10646-1 fonts

2001-04-08 Thread Florian Weimer
Markus Kuhn [EMAIL PROTECTED] writes: It seems that the soon to be released new TCL/Tk 8.3.3 is finally going to be able to use *-iso10646-1 fonts directly, thanks to recent patches by Jeff Hobbs [EMAIL PROTECTED] and Brent Welch [EMAIL PROTECTED]. BTW, what about their UTF-8

Re: iconv in glibc

2000-09-30 Thread Florian Weimer
Bruno Haible [EMAIL PROTECTED] writes: Edmund GRIMLEY EVANS asked on 1999-11-25: Will iconv() in glibc-2.2 convert from utf-7? Yes. It has been added to glibc-2.2 in order to cope with email messages sent out in this encoding by some mailers in East Asia. I've seen quite a few

Re: Substituting malformed UTF-8 sequences in a decoder

2000-07-25 Thread Florian Weimer
Edmund GRIMLEY EVANS [EMAIL PROTECTED] writes: B) Emit a U+FFFD for every byte in a malformed UTF-8 sequence This is what I do in Mutt. It's easy to implement and works for any multibyte encoding; the program doesn't have to know about UTF-8. This is what I recommend at the moment, with