Re: unicode in emacs 21

2001-11-04 Thread Florian Weimer

Eli Zaretskii [EMAIL PROTECTED] writes:

 The GNU Emacs/Unicode proposal I've seen seems to have this property,
 too.  (At least the proposal is ambiguous, and one interpretation is
 that you can encode a single character in multiple ways.)

 Unless you refer to the CNS plane and Japanese Han characters, which
 were deliberately left ununified (in addition to the Unicode
 codepoints for those characters), I think you are mistaken.

I hope so. ;-)

 Could you please point out where in the proposal do you see that a
 character can be encoded in multiple ways?

I think now that the surrogate stuff has been explained, the encoding
to to UCS-E (Unicode-compatible Character Set for Emacs) is indeed
unambiguous.

However, UTF-E (the buffer encoding) opens possibilities for different
encodings of the same UCS-E code point, but this can be resolved, I
think.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: unicode in emacs 21

2001-11-04 Thread Dave Love

 JK == Jimmy Kaplowitz [EMAIL PROTECTED] writes:

 JK It's the only editor I've used (including Yudit) that could
 JK display the sequence U+0283 U+034D correctly.

[With what font?]

Note that character composition (combination) is a user-level feature
in Emacs, so if rules are implemented which you don't like, you can
change them.

 JK Well, Emacs does have more features (including some that are less
 JK essential, such as doctor mode :), but vim has quite enough for
 JK most purposes.

I assumed the point was specifically about the display, tty v. X.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: unicode in emacs 21

2001-11-04 Thread Dave Love

 EZ == Eli Zaretskii [EMAIL PROTECTED] writes:

 EZ The current plan for Unicode was discussed at length 3 years ago, and
 EZ the result was what I described.  I don't think it's wise for us to
 EZ reopen that discussion again

Well I, at least, don't understand why it's necessary, at least for
technical reasons.  I have a fair amount of experience as a user and
implementor.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: unicode in emacs 21

2001-11-04 Thread Dave Love

 EZ == Eli Zaretskii [EMAIL PROTECTED] writes:

 EZ Unless you refer to the CNS plane and Japanese Han characters,
 EZ which were deliberately left ununified (in addition to the
 EZ Unicode codepoints for those characters), I think you are
 EZ mistaken.

I.e., he's right.

Someone needs to give a cogent argument why it's a problem in practice
to have multiple representations if you can canonicalize as required,
especially why this should be any different for Western scripts than
for CJK.  Note that I have some practical experience of this in Emacs.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: unicode in emacs 21

2001-10-30 Thread Florian Weimer

Richard Stallman [EMAIL PROTECTED] writes:

 Supporting Unicode superficially while retaining the current internal
 representation raises a number of problems, one of them being that the
 internal representation has several alternatives for the same character
 which correspond to the same code point in Unicode.

The GNU Emacs/Unicode proposal I've seen seems to have this property,
too.  (At least the proposal is ambiguous, and one interpretation is
that you can encode a single character in multiple ways.)
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-30 Thread Eli Zaretskii

 From: Florian Weimer [EMAIL PROTECTED]
 Date: Tue, 30 Oct 2001 08:09:20 +0100
 
 Richard Stallman [EMAIL PROTECTED] writes:
 
  Supporting Unicode superficially while retaining the current internal
  representation raises a number of problems, one of them being that the
  internal representation has several alternatives for the same character
  which correspond to the same code point in Unicode.
 
 The GNU Emacs/Unicode proposal I've seen seems to have this property,
 too.  (At least the proposal is ambiguous, and one interpretation is
 that you can encode a single character in multiple ways.)

Unless you refer to the CNS plane and Japanese Han characters, which
were deliberately left ununified (in addition to the Unicode
codepoints for those characters), I think you are mistaken.  Could you
please point out where in the proposal do you see that a character can
be encoded in multiple ways?
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-29 Thread Richard Stallman

That view is unfair to the people who have done lots of work, himi in
particular.  `Working on Unicode support' in my book isn't restricted
to implementing an apparently-unnecessary, disruptive, incompatible
change to the internal encoding, even if it's what one wants ideally.

I think that supporting Unicode at the internal level is the best way
to support it fully, and that's what we have decided to do.  As a
result of that decision, we are sometimes reluctant to put time into
studying, installing and maintaining other approaches which would be
obsolete once we do it the right way.

Supporting Unicode superficially while retaining the current internal
representation raises a number of problems, one of them being that the
internal representation has several alternatives for the same character
which correspond to the same code point in Unicode.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-29 Thread Dave Love

 MK == Markus Kuhn [EMAIL PROTECTED] writes:

 MK CJK Greek/Cyrillic characters are traditionally displayed as
 MK double-width, whereas ISO 8859/ISO 10646 Greek  Cyrillic
 MK characters are traditionally displayed single-width.

Yes, but...

 MK But surely all the European encodings such as ISO 8859, KOI,
 MK etc. should be urgently unified with Unicode.

The implementation you may recall hearing about earlier in the year is
now available (posted to gnu.emacs.sources).
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-29 Thread Dave Love

 MK == Markus Kuhn [EMAIL PROTECTED] writes:

 MK Using UTF-8 as the internal Emacs encoding is one way of achieving
 MK continued guaranteed binary transparency, 

I.e., maintain a malformed internal representation??

 MK coming up with a tricky encoding for malformed UTF-8 sequences is
 MK another one.

We can maintain arbitrary byte sequences now.  It's not terribly
tricky, just not too robust through the use of the eight-bit-x
charsets.

I don't think it's very important that reading and writing malformed
sequences by utf-8.el isn't always idempotent.  Presumably the three
or four relevant test cases could be addressed in the CCL, but I think
there are better things to spend the time on.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Eli Zaretskii

[I suggest to have this discussion on emacs-unicode mailing list, so I
added it to the list of addressees.]

 From: Florian Weimer [EMAIL PROTECTED]
 Date: Sun, 28 Oct 2001 00:58:29 +0200
 
 Eli Zaretskii [EMAIL PROTECTED] writes:
 
  Emacs cannot use a pure UTF-8 encoding, since some cultures don't want
  unification, and it was decided that Emacs should not force
  unification on those cultures.
 
 Why can't you continue to use the MULE code and just change the
 character sets to reflect certain aspects of Unicode?

The current plan for Unicode was discussed at length 3 years ago, and
the result was what I described.  I don't think it's wise for us to
reopen that discussion again, unless you think the UTF-8-based
representation is a terribly wrong design.

 One such aspect
 is Latin unification, for example.  (The Unicode people get very
 annoyed if you talk about unification, source separation rule etc.
 in the context of non-Han scripts...)

IIRC, the term unification appears early in the Unicode standard,
not necessarily in conjunction with ``Han unification''.  It is cited
as one of the principles on the Unicode approach.  So I don't see any
reason for the unnamed Unicode people to get annoyed by a term they
themselves coined.

 In a second step, support for normalization, combining characters
 etc. would have to be added, but this could be based on the reliable
 foundation of the old MULE code.

Conceivably, changing the internal representation doesn't mean we need
to rewrite all of the existing code, just the low-level parts of it
that deal with code conversions (i.e. subroutines of encoding and
decoding functions).
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Florian Weimer

H. Peter Anvin [EMAIL PROTECTED] writes:

 Does that mean you're painting yourself into a corner, though,
 requiring manual work to integrate the increasingly Unicode-based
 infrastructure support that is becoming available?  Odds are pretty
 good that they are.

I don't think it is a good idea to use operating system Unicode
support.  This would mean that GNU Emacs behaves differently on
different operating systems, depending on the installed locale
descriptions, for example.

OTOH, the character encodings posted earlier to this list are as
incompatible with existing Unicode support as the current emacs-mule
internal encoding.  In effect, just one Emacs-specific internal
encoding is replaced by another.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Janusz S. Bie

On Eli Zaretskii [EMAIL PROTECTED]  wrote:

[...]

 Lately, the emacs-unicode mailing list was revived, in the hope that it 
 will boost the activity.  Sadly, the traffic on that list is nil.

Was the list properly announced? I've seen a mention of it, but no
instruction how to suscribe. Where is the list hosted? It is not
accessible from the emacs pages at http://savannah.gnu.org.

Best regards

Janusz

-- 
 ,   
dr hab. Janusz S. Bien, prof. UW
Prof. Janusz S. Bien, Warsaw Uniwersity
http://www.orient.uw.edu.pl/~jsbien/
-
Na tym koncie czytam i wysylam poczte i wiadomosci offline.
On this account I read/post mail/news offline.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Eli Zaretskii


On 28 Oct 2001, Janusz S. =?iso-8859-2?q?Bie=F1?= wrote:

 On Eli Zaretskii [EMAIL PROTECTED]  wrote:
 
 [...]
 
  Lately, the emacs-unicode mailing list was revived, in the hope that it 
  will boost the activity.  Sadly, the traffic on that list is nil.
 
 Was the list properly announced? I've seen a mention of it, but no
 instruction how to suscribe. Where is the list hosted? It is not
 accessible from the emacs pages at http://savannah.gnu.org.

It's not a public list (and, given the traffic, I'm not convinced it's 
worth the hassle to make it a public one).  However, the people who 
subscribe to that list know they are subscribed (they've asked for that 
explicitly), so no announcement seems to be necessary.

I can subscribe you if you want.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Florian Weimer

Eli Zaretskii [EMAIL PROTECTED] writes:

 Why can't you continue to use the MULE code and just change the
 character sets to reflect certain aspects of Unicode?
 
 The current plan for Unicode was discussed at length 3 years ago, and
 the result was what I described.

Is the discussion archived somewhere, or are there some design
documents which resulted from the discussion?

 I don't think it's wise for us to reopen that discussion again,
 unless you think the UTF-8-based representation is a terribly wrong
 design.

Of course, it's hard to come up with constructive criticism when you
don't know what's already there. ;-)

 So I don't see any reason for the unnamed Unicode people to get
 annoyed by a term they themselves coined.

Me neither, but I got flamed in the past. :-/

 Conceivably, changing the internal representation doesn't mean we need
 to rewrite all of the existing code, just the low-level parts of it
 that deal with code conversions (i.e. subroutines of encoding and
 decoding functions).

I still don't understand the need for such a change.  In theory, the
internal representation of characters should be invisible to the
higher levels.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Dave Love

 OD == Oliver Doepner [EMAIL PROTECTED] writes:

 OD There is vim 6.x now with full utf-8 support on the xterm.

[Does `full utf-8 support' mean level 3?]

Emacs can do utf-8 i/o under ttys that support it, though you don't
_need_ such support -- either input or output -- to edit utf-8 text.

 OD It is much faster than emacs on x11 of course.

I'm surprised that's much of an issue.  I assume Emacs under X is much
more capable.

 OD I was happy to see Emacs 21 announced. but the unicode support
 OD does not seem to have moved forward very much

It's moved from zero to the state where it's perfectly fine for
editing at least the Western technical text that interests me.  E.g.,
Kuhn's UTF-8-demo.utf works modulo the level 2 text, for which one can
add support straightforwardly at the Lisp level.  It also allowed
producing coding systems for all the 8-bit charsets for GNUish
locales, which perhaps matters more in the wide world than utf-8 per
se.  With some customization, I can also at least _display_
utf-8-encoded CJK text.  I can send and receive utf-8-encoded mail and
browse utf-8-encoded web sites (with the development W3 package).

The Mule-UCS package provides more if necessary, specifically better
coverage of the BMP.

 OD Is the internal representation still the special MULE format ??~

Yes.  So what?  [There has been much mis-representation of Mule, some
of it malicious.]  There is a yet-unimplemented scheme for coverage up
to U+10 within that encoding.  Even now, with Lisp-level changes
one could build an (incompatible) Emacs to cover the BMP, sacrificing
some of the standard charsets.

-- 
Bragging about Unicode support: ‘2d sinθ = nλ’ is plain text. ☺
URL:http://www.unicode.org/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Dave Love

 EZ == Eli Zaretskii [EMAIL PROTECTED] writes:

 EZ The problem is that characters are still not unified in Emacs 21.

A package was contributed to do that for ISO 8859 characters.  It's
been posted to gnu.emacs.sources, so that shouldn't be an issue for
anyone who's bothered by it.

 EZ So we have two versions of Cyrillic characters, two versions of
 EZ Greek characters, two versions of Hebrew characters, etc.:  one
 EZ version in the new Unicode set, the other version in the old Mule
 EZ set.

There are more than two, at least for Greek and Cyrillic.  Those in
the Far Eastern charsets could be unified too if anyone cared.  This
issue clearly doesn't apply only to the Unicode charsets, and, as a
user, I don't think it's much of a problem in practice.

 EZ What can I say except ``volunteers are welcome...'' etc.?  I can't 
 EZ believe no one wants Unicode badly enough to work on its support in 
 EZ Emacs, but what do I do with facts which fly in my face?

That view is unfair to the people who have done lots of work, himi in
particular.  `Working on Unicode support' in my book isn't restricted
to implementing an apparently-unnecessary, disruptive, incompatible
change to the internal encoding, even if it's what one wants ideally.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Jimmy Kaplowitz

On Sun, Oct 28, 2001 at 05:04:22PM +, Dave Love wrote:
  OD == Oliver Doepner [EMAIL PROTECTED] writes:
 
  OD There is vim 6.x now with full utf-8 support on the xterm.
 
 [Does `full utf-8 support' mean level 3?]

Well, it handles double-width characters as well as up to two combining
characters. It's the only editor I've used (including Yudit) that could
display the sequence U+0283 U+034D correctly.

 Emacs can do utf-8 i/o under ttys that support it, though you don't
 _need_ such support -- either input or output -- to edit utf-8 text.
 
  OD It is much faster than emacs on x11 of course.
 
 I'm surprised that's much of an issue.  I assume Emacs under X is much
 more capable.

Well, Emacs does have more features (including some that are less
essential, such as doctor mode :), but vim has quite enough for most
purposes.

  OD I was happy to see Emacs 21 announced. but the unicode support
  OD does not seem to have moved forward very much
 
 It's moved from zero to the state where it's perfectly fine for
 editing at least the Western technical text that interests me.  E.g.,
 Kuhn's UTF-8-demo.utf works modulo the level 2 text, for which one can
 add support straightforwardly at the Lisp level.  It also allowed
 producing coding systems for all the 8-bit charsets for GNUish
 locales, which perhaps matters more in the wide world than utf-8 per
 se.  With some customization, I can also at least _display_
 utf-8-encoded CJK text.  I can send and receive utf-8-encoded mail and
 browse utf-8-encoded web sites (with the development W3 package).

Vim can display the UTF-8-demo file perfectly, with no exceptions. Also,
although I haven't tested this, I am told it can write as well as
display utf-8 CJK text.

- Jimmy Kaplowitz
[EMAIL PROTECTED] / [EMAIL PROTECTED]

 PGP signature


Re: unicode in emacs 21

2001-10-28 Thread Markus Kuhn

On 28 Oct 2001, Dave Love wrote:
  EZ So we have two versions of Cyrillic characters, two versions of
  EZ Greek characters, two versions of Hebrew characters, etc.:  one
  EZ version in the new Unicode set, the other version in the old Mule
  EZ set.

 There are more than two, at least for Greek and Cyrillic.  Those in
 the Far Eastern charsets could be unified too if anyone cared.

Full unification here would have the disadvantage that CJK Greek/Cyrillic
characters are traditionally displayed as double-width, whereas ISO
8859/ISO 10646 Greek  Cyrillic characters are traditionally displayed
single-width. Some CJK users might be quite happy about a lack of
unification here to preserve the display width of these characters. Same
for the block graphics characters, which xterm with ISO10646 fonts
displays single-width whereas kterm with JIS/etc. fonts displays in
double-width.

But surely all the European encodings such as ISO 8859, KOI, etc. should
be urgently unified with Unicode. The relevant standards have already been
(re)written to represent these encodings just as single-byte encodings of
ISO 10646 subsets.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-27 Thread Markus Kuhn

On Thu, 25 Oct 2001, Eli Zaretskii wrote:
  Is the internal representation still the special MULE format ??~

 Yes.  But the internal representation is not the problem here; ideally,
 users and Lisp programs shouldn't be worrying about how characters are
 represented internally.  The problem is that characters are still not
 unified in Emacs 21.

Not entirely.

Internal representation does matter somewhat when it comes to the handling
of malformed UTF-8 sequences. I think it is highly desireable that the
UTF-8 - emacs internal - UTF-8 conversion roundtrip is made 100% binary
transparent. Loading and saving a file that contains malformed UTF-8
sequences should not change them, but character encoding conversions are
prone to throw away information in the case of invalid source byte
streams.

Using UTF-8 as the internal Emacs encoding is one way of achieving
continued guaranteed binary transparency, coming up with a tricky encoding
for malformed UTF-8 sequences is another one. I favour the former
approach, which is also what other UTF-8 capable modern editors do today.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-27 Thread Florian Weimer

Eli Zaretskii [EMAIL PROTECTED] writes:

 Emacs cannot use a pure UTF-8 encoding, since some cultures don't want
 unification, and it was decided that Emacs should not force
 unification on those cultures.

Why can't you continue to use the MULE code and just change the
character sets to reflect certain aspects of Unicode?  One such aspect
is Latin unification, for example.  (The Unicode people get very
annoyed if you talk about unification, source separation rule etc.
in the context of non-Han scripts...)

In a second step, support for normalization, combining characters
etc. would have to be added, but this could be based on the reliable
foundation of the old MULE code.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-27 Thread H. Peter Anvin

Followup to:  [EMAIL PROTECTED]
By author:Florian Weimer [EMAIL PROTECTED]
In newsgroup: linux.utf8

 Eli Zaretskii [EMAIL PROTECTED] writes:
 
  Emacs cannot use a pure UTF-8 encoding, since some cultures don't want
  unification, and it was decided that Emacs should not force
  unification on those cultures.
 
 Why can't you continue to use the MULE code and just change the
 character sets to reflect certain aspects of Unicode?  One such aspect
 is Latin unification, for example.  (The Unicode people get very
 annoyed if you talk about unification, source separation rule etc.
 in the context of non-Han scripts...)
 
 In a second step, support for normalization, combining characters
 etc. would have to be added, but this could be based on the reliable
 foundation of the old MULE code.
 

Does that mean you're painting yourself into a corner, though,
requiring manual work to integrate the increasingly Unicode-based
infrastructure support that is becoming available?  Odds are pretty
good that they are.

-hpa
-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt[EMAIL PROTECTED]
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-27 Thread D. Dale Gulledge

H. Peter Anvin wrote:

 Does that mean you're painting yourself into a corner, though,
 requiring manual work to integrate the increasingly Unicode-based
 infrastructure support that is becoming available?  Odds are pretty
 good that they are.

Since I volunteered to help with this effort, I'd like to know what's
already out there.  I agree that duplicating functionality in the Emacs
code that is already available from supported free libraries would be a
bad idea unless there is a compelling reason.  Of course, Emacs is
buildable on most systems that have a working C compiler and a standard
implementation of libc.  Depending on anything else, unless it can be
imported into the Emacs source tree would be a questionable idea.

-- 
D. Dale Gulledge, Sr. Programmer,
[EMAIL PROTECTED]
C, C++, Perl, Unix (AIX, Linux), Oracle, Java,
Internationalization (i18n), Awk.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-26 Thread Andreas Schwab

Eli Zaretskii [EMAIL PROTECTED] writes:

| On Thu, 25 Oct 2001, Oliver Doepner wrote:
| 
|  my question: what happened in this area in Emacs 21 ??
| 
| What happened is that Emacs now supports Unicode characters that 
| basically span the BMP with the exception of CJK ideographic characters.
| It also has some initial support for UTF-8.

Note that the Mule-UCS package works fine with Emacs21 and allows you to
send UTF8 mails with Gnus, for example.

Andreas.

-- 
Andreas Schwab  And now for something
[EMAIL PROTECTED]  completely different.
SuSE Labs, SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-25 Thread D. Dale Gulledge

I haven't meant for anything I've written to indicate that Emacs is not
a useful editor for UTF-8 encoded text.  I have found it quite usable. 
I've had a couple of configuration headaches along to way specifically
because I am simultaneously maintaining files in both UTF-8 and Latin-3.

If the alphabets you use fall within the ranges of characters that Emacs
now handles, I can't see any strong argument not to use Emacs.  I
switched to the prereleases of Emacs 21 a few weeks ago specifically for
the Unicode support.  For me, there was really no option of choosing
anything else, even if I had wanted to.  I am doing some heavily
customized stuff supported by a pile of Emacs Lisp code tailored to my
data over the past 6 1/2 years.  Emacs Lisp has saved me hundred of
hours.

In the end, I would like to see Emacs use Unicode internally.

Oliver Doepner wrote:

 I was happy to see Emacs 21 announced. but the unicode support does not
 seem to have moved forward very much - as i have heard and read from some
 people.
 
 my question: what happened in this area in Emacs 21 ?? Is the internal
 representation still the special MULE format ??~
 And are there any plans and/or activities to achieve these things ?

-- 
D. Dale Gulledge, Sr. Programmer,
[EMAIL PROTECTED]
C, C++, Perl, Unix (AIX, Linux), Oracle, Java,
Internationalization (i18n), Awk.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-25 Thread Eli Zaretskii


On Thu, 25 Oct 2001, Oliver Doepner wrote:

 my question: what happened in this area in Emacs 21 ??

What happened is that Emacs now supports Unicode characters that 
basically span the BMP with the exception of CJK ideographic characters.
It also has some initial support for UTF-8.

 Is the internal
 representation still the special MULE format ??~

Yes.  But the internal representation is not the problem here; ideally, 
users and Lisp programs shouldn't be worrying about how characters are 
represented internally.  The problem is that characters are still not 
unified in Emacs 21.  So we have two versions of Cyrillic characters, two 
versions of Greek characters, two versions of Hebrew characters, etc.: 
one version in the new Unicode set, the other version in the old Mule 
set.  And Emacs thinks these are different characters, so if you mix 
them without converting them, you are in trouble.

 And are there any plans and/or activities to achieve these things ?

Oh, we have plenty of plans!  The problem is with volunteers who would 
step forward and actually produce some code that implements those plans.

It might come as a surprise to some that the decision to change the 
internal representation of characters to something that is based on 
Unicode and that unifies the characters--that decision was made several 
years ago (beginning of 1998, to be exact).  At that time, discussions 
were held which produced a detailed design of the new representation.  
What remains is for few motivated individuals to sit down and code the 
darn thing.  Which is where we are today, more than 3 years later.

Lately, the emacs-unicode mailing list was revived, in the hope that it 
will boost the activity.  Sadly, the traffic on that list is nil.

What can I say except ``volunteers are welcome...'' etc.?  I can't 
believe no one wants Unicode badly enough to work on its support in 
Emacs, but what do I do with facts which fly in my face?
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/