Markus Kuhn [EMAIL PROTECTED] writes:
c) Emacs - Current Emacs UTF-8 support is still a bit too provisional
for my comfort. In particular, I don't like that the UTF-8 mode is not
binary transparent. Work on turning Emcas completely into a UTF-8
editor is under way, and I'd
Markus Kuhn [EMAIL PROTECTED] writes:
As we are talking about en_US.UTF-8:
General warning: Please do not use the locale name en_US.UTF-8 anywhere
outside North America.
Why can't you use it for LC_CTYPE and LC_MESSAGES, say?
Determining paper size by locale is rather strange. What's
Markus Kuhn [EMAIL PROTECTED] writes:
The revised POSIX standard, which has been merged with the Single UNIX
Specification is now available online in HTML!
It is complicated to look up sections by their number. Or am I
missing something?
--
Linux-UTF8: i18n of Linux on all levels
Archive:
Bram Moolenaar [EMAIL PROTECTED] writes:
Ignoring case does not appear to lead to compatibility problems.
It does. Case is used to separate public and private namespace
(probably a design mistake). However, we shuld ignore case in the
charset: we are going to use mainly MIME charset names (at
Keld Jørn Simonsen [EMAIL PROTECTED] writes:
Can't you get access to them in the onsite department of the library?
(That is, the department where you cannot loan the books, but only
read them onsite).
No, definitely not. The librarians don't even know how to get those
standards (ISO and
Eli Zaretskii [EMAIL PROTECTED] writes:
The GNU Emacs/Unicode proposal I've seen seems to have this property,
too. (At least the proposal is ambiguous, and one interpretation is
that you can encode a single character in multiple ways.)
Unless you refer to the CNS plane and Japanese Han
Kenichi Handa [EMAIL PROTECTED] writes:
Florian Weimer [EMAIL PROTECTED] writes:
What does 'via surrogate pair' mean? I guess the second line should
read:
00 Unicode 20bit (U+1 - U+F)
Yes. That's correct, and the third line shoud read as below
.
It reflects the discussion on the `emacs-unicode' mailing list and
the `Emacs-Unicode-990824' proposal.
Version $Revision: 1.1 $, written by Florian Weimer.
Requirements
The internal character code of a character has to fit in 22 bits.
(The remaining bits of a 32 bit host
Richard Stallman [EMAIL PROTECTED] writes:
Supporting Unicode superficially while retaining the current internal
representation raises a number of problems, one of them being that the
internal representation has several alternatives for the same character
which correspond to the same code
H. Peter Anvin [EMAIL PROTECTED] writes:
Does that mean you're painting yourself into a corner, though,
requiring manual work to integrate the increasingly Unicode-based
infrastructure support that is becoming available? Odds are pretty
good that they are.
I don't think it is a good idea
Eli Zaretskii [EMAIL PROTECTED] writes:
Why can't you continue to use the MULE code and just change the
character sets to reflect certain aspects of Unicode?
The current plan for Unicode was discussed at length 3 years ago, and
the result was what I described.
Is the discussion archived
Eli Zaretskii [EMAIL PROTECTED] writes:
Emacs cannot use a pure UTF-8 encoding, since some cultures don't want
unification, and it was decided that Emacs should not force
unification on those cultures.
Why can't you continue to use the MULE code and just change the
character sets to reflect
[EMAIL PROTECTED] (Kai Henningsen) writes:
* Do we need a native wide char encoding, too (mostly for Win32 where
it's UTF-16, but possibly also some Asian thing)?
A single 'char' encoded in UTF-16? This sounds horrible.
I can't quite parse that.
If you've got a 16 bit wchar_t, there's
[EMAIL PROTECTED] (Kai Henningsen) writes:
* Do we need a native wide char encoding, too (mostly for Win32 where
it's UTF-16, but possibly also some Asian thing)?
A single 'char' encoded in UTF-16? This sounds horrible.
-
Linux-UTF8: i18n of Linux on all levels
Archive:
Markus Kuhn [EMAIL PROTECTED] writes:
Antiword is available from
http://www.winfield.demon.nl/
and provides significantly better DOC - plaintext conversion
than any Micorsoft product.
Unfortunately, this is not true. It fails badly on Word documents
with embedded change
Bruno Haible [EMAIL PROTECTED] writes:
The programs we are waiting for are:
- emacs. In an UTF-8 locale, it does not set the
keyboard-coding-system to UTF-8, thus when I type umlaut keys
strange things happen. And it does not set the default file
encoding to UTF-8,
Markus Kuhn [EMAIL PROTECTED] writes:
Here is another iteration of the SCW control function definition, to
allow users of terminal emulators full control over whether single-width
or double-width glyphs will be used:
Why don't you use the Unicode tagging mechanism (or some special
Michael B. Allen [EMAIL PROTECTED] writes:
Why doesn't wchar_t play nice with Unicode?
It does, if your C implementation defines the macro name
__STDC_ISO_10646__ (see the C standard for additional information).
-
Linux-UTF8: i18n of Linux on all levels
Archive:
Sorry for this question which is slightly off topic:
Are the UTF-8 definitions in ISO/IEC 10646-1:200 and RFC 2279
identical or equivalent? Can any harm result if a nomative document
refers to both definitions (this is a bad idea if the definitions are
slightly different).
And BTW: Does ISO
Markus Kuhn [EMAIL PROTECTED] writes:
Note that we have the exact same problem with various European/American
encodings such as CP437, where IBM and Microsoft came up with radically
different and incompatible mappings
If I'm not mistaken, at least one character in CP437 has even been
"H. Peter Anvin" [EMAIL PROTECTED] writes:
The Chinese Academy Of Sciences has published a set of scalable fonts
in several styles, but unfortunately in a proprietary format with
closed-source converters to PK format for usage with TeX.
Is there any descriptions of this format?
I
Tomohiro KUBOTA [EMAIL PROTECTED] writes:
I don't know about Chinese and Korean font projects.
The Chinese Academy Of Sciences has published a set of scalable fonts
in several styles, but unfortunately in a proprietary format with
closed-source converters to PK format for usage with
Markus Kuhn [EMAIL PROTECTED] writes:
The only characters for which double-width (square) is appropriate are
- Han ideographs
- Hiragana/Katakana
- Hangul
- CJK punctuation
- fullwidth forms
There are a few other characters which simply can't be displayed
properly
Markus Kuhn [EMAIL PROTECTED] writes:
CP437/CP850 is still used today in the MS-DOS box on *every* Windows98
machine in West Europe/US/etc.
These codepages are also used on IBM operating systems such as OS/2
and AIX, I guess.
-
Linux-UTF8: i18n of Linux on all levels
Archive:
Martin Norbck [EMAIL PROTECTED] writes:
I think this is a simple issue of counting the vertical lines in the
glyph.
I think that's to coarse. There might be some cases in which existing
monospace fonts treat characters as single-width because systems with
9x16 or 8x8 glyph cells are
Bruno Haible [EMAIL PROTECTED] writes:
Wouldn't it be better to use standard names in all cases, and use a
simple Emacs lisp function to convert the standard name to an Emacs
name? The Emacs PO mode already has code for this.
I think Gnus implements a different, but similar
Markus Kuhn [EMAIL PROTECTED] writes:
I see actually no big problem to make all the circled and parenthesised
numbers and letters doublewidth in the standard wcwidth, or even the EM
DASH. It would just mean that the definition of wcwidth becomes an
actual design issue, and not just
Markus Kuhn [EMAIL PROTECTED] writes:
It seems that the soon to be released new TCL/Tk 8.3.3 is finally going
to be able to use *-iso10646-1 fonts directly, thanks to recent patches
by Jeff Hobbs [EMAIL PROTECTED] and Brent Welch [EMAIL PROTECTED].
BTW, what about their UTF-8
Bruno Haible [EMAIL PROTECTED] writes:
Edmund GRIMLEY EVANS asked on 1999-11-25:
Will iconv() in glibc-2.2 convert from utf-7?
Yes. It has been added to glibc-2.2 in order to cope with email
messages sent out in this encoding by some mailers in East Asia.
I've seen quite a few
Edmund GRIMLEY EVANS [EMAIL PROTECTED] writes:
B) Emit a U+FFFD for every byte in a malformed UTF-8 sequence
This is what I do in Mutt. It's easy to implement and works for any
multibyte encoding; the program doesn't have to know about UTF-8.
This is what I recommend at the moment, with
30 matches
Mail list logo