Re: unicode in emacs 21
Eli Zaretskii [EMAIL PROTECTED] writes: The GNU Emacs/Unicode proposal I've seen seems to have this property, too. (At least the proposal is ambiguous, and one interpretation is that you can encode a single character in multiple ways.) Unless you refer to the CNS plane and Japanese Han characters, which were deliberately left ununified (in addition to the Unicode codepoints for those characters), I think you are mistaken. I hope so. ;-) Could you please point out where in the proposal do you see that a character can be encoded in multiple ways? I think now that the surrogate stuff has been explained, the encoding to to UCS-E (Unicode-compatible Character Set for Emacs) is indeed unambiguous. However, UTF-E (the buffer encoding) opens possibilities for different encodings of the same UCS-E code point, but this can be resolved, I think. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
JK == Jimmy Kaplowitz [EMAIL PROTECTED] writes: JK It's the only editor I've used (including Yudit) that could JK display the sequence U+0283 U+034D correctly. [With what font?] Note that character composition (combination) is a user-level feature in Emacs, so if rules are implemented which you don't like, you can change them. JK Well, Emacs does have more features (including some that are less JK essential, such as doctor mode :), but vim has quite enough for JK most purposes. I assumed the point was specifically about the display, tty v. X. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
EZ == Eli Zaretskii [EMAIL PROTECTED] writes: EZ The current plan for Unicode was discussed at length 3 years ago, and EZ the result was what I described. I don't think it's wise for us to EZ reopen that discussion again Well I, at least, don't understand why it's necessary, at least for technical reasons. I have a fair amount of experience as a user and implementor. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
EZ == Eli Zaretskii [EMAIL PROTECTED] writes: EZ Unless you refer to the CNS plane and Japanese Han characters, EZ which were deliberately left ununified (in addition to the EZ Unicode codepoints for those characters), I think you are EZ mistaken. I.e., he's right. Someone needs to give a cogent argument why it's a problem in practice to have multiple representations if you can canonicalize as required, especially why this should be any different for Western scripts than for CJK. Note that I have some practical experience of this in Emacs. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
Richard Stallman [EMAIL PROTECTED] writes: Supporting Unicode superficially while retaining the current internal representation raises a number of problems, one of them being that the internal representation has several alternatives for the same character which correspond to the same code point in Unicode. The GNU Emacs/Unicode proposal I've seen seems to have this property, too. (At least the proposal is ambiguous, and one interpretation is that you can encode a single character in multiple ways.) - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
From: Florian Weimer [EMAIL PROTECTED] Date: Tue, 30 Oct 2001 08:09:20 +0100 Richard Stallman [EMAIL PROTECTED] writes: Supporting Unicode superficially while retaining the current internal representation raises a number of problems, one of them being that the internal representation has several alternatives for the same character which correspond to the same code point in Unicode. The GNU Emacs/Unicode proposal I've seen seems to have this property, too. (At least the proposal is ambiguous, and one interpretation is that you can encode a single character in multiple ways.) Unless you refer to the CNS plane and Japanese Han characters, which were deliberately left ununified (in addition to the Unicode codepoints for those characters), I think you are mistaken. Could you please point out where in the proposal do you see that a character can be encoded in multiple ways? - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
That view is unfair to the people who have done lots of work, himi in particular. `Working on Unicode support' in my book isn't restricted to implementing an apparently-unnecessary, disruptive, incompatible change to the internal encoding, even if it's what one wants ideally. I think that supporting Unicode at the internal level is the best way to support it fully, and that's what we have decided to do. As a result of that decision, we are sometimes reluctant to put time into studying, installing and maintaining other approaches which would be obsolete once we do it the right way. Supporting Unicode superficially while retaining the current internal representation raises a number of problems, one of them being that the internal representation has several alternatives for the same character which correspond to the same code point in Unicode. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
MK == Markus Kuhn [EMAIL PROTECTED] writes: MK CJK Greek/Cyrillic characters are traditionally displayed as MK double-width, whereas ISO 8859/ISO 10646 Greek Cyrillic MK characters are traditionally displayed single-width. Yes, but... MK But surely all the European encodings such as ISO 8859, KOI, MK etc. should be urgently unified with Unicode. The implementation you may recall hearing about earlier in the year is now available (posted to gnu.emacs.sources). - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
MK == Markus Kuhn [EMAIL PROTECTED] writes: MK Using UTF-8 as the internal Emacs encoding is one way of achieving MK continued guaranteed binary transparency, I.e., maintain a malformed internal representation?? MK coming up with a tricky encoding for malformed UTF-8 sequences is MK another one. We can maintain arbitrary byte sequences now. It's not terribly tricky, just not too robust through the use of the eight-bit-x charsets. I don't think it's very important that reading and writing malformed sequences by utf-8.el isn't always idempotent. Presumably the three or four relevant test cases could be addressed in the CCL, but I think there are better things to spend the time on. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
[I suggest to have this discussion on emacs-unicode mailing list, so I added it to the list of addressees.] From: Florian Weimer [EMAIL PROTECTED] Date: Sun, 28 Oct 2001 00:58:29 +0200 Eli Zaretskii [EMAIL PROTECTED] writes: Emacs cannot use a pure UTF-8 encoding, since some cultures don't want unification, and it was decided that Emacs should not force unification on those cultures. Why can't you continue to use the MULE code and just change the character sets to reflect certain aspects of Unicode? The current plan for Unicode was discussed at length 3 years ago, and the result was what I described. I don't think it's wise for us to reopen that discussion again, unless you think the UTF-8-based representation is a terribly wrong design. One such aspect is Latin unification, for example. (The Unicode people get very annoyed if you talk about unification, source separation rule etc. in the context of non-Han scripts...) IIRC, the term unification appears early in the Unicode standard, not necessarily in conjunction with ``Han unification''. It is cited as one of the principles on the Unicode approach. So I don't see any reason for the unnamed Unicode people to get annoyed by a term they themselves coined. In a second step, support for normalization, combining characters etc. would have to be added, but this could be based on the reliable foundation of the old MULE code. Conceivably, changing the internal representation doesn't mean we need to rewrite all of the existing code, just the low-level parts of it that deal with code conversions (i.e. subroutines of encoding and decoding functions). - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
H. Peter Anvin [EMAIL PROTECTED] writes: Does that mean you're painting yourself into a corner, though, requiring manual work to integrate the increasingly Unicode-based infrastructure support that is becoming available? Odds are pretty good that they are. I don't think it is a good idea to use operating system Unicode support. This would mean that GNU Emacs behaves differently on different operating systems, depending on the installed locale descriptions, for example. OTOH, the character encodings posted earlier to this list are as incompatible with existing Unicode support as the current emacs-mule internal encoding. In effect, just one Emacs-specific internal encoding is replaced by another. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
On Eli Zaretskii [EMAIL PROTECTED] wrote: [...] Lately, the emacs-unicode mailing list was revived, in the hope that it will boost the activity. Sadly, the traffic on that list is nil. Was the list properly announced? I've seen a mention of it, but no instruction how to suscribe. Where is the list hosted? It is not accessible from the emacs pages at http://savannah.gnu.org. Best regards Janusz -- , dr hab. Janusz S. Bien, prof. UW Prof. Janusz S. Bien, Warsaw Uniwersity http://www.orient.uw.edu.pl/~jsbien/ - Na tym koncie czytam i wysylam poczte i wiadomosci offline. On this account I read/post mail/news offline. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
On 28 Oct 2001, Janusz S. =?iso-8859-2?q?Bie=F1?= wrote: On Eli Zaretskii [EMAIL PROTECTED] wrote: [...] Lately, the emacs-unicode mailing list was revived, in the hope that it will boost the activity. Sadly, the traffic on that list is nil. Was the list properly announced? I've seen a mention of it, but no instruction how to suscribe. Where is the list hosted? It is not accessible from the emacs pages at http://savannah.gnu.org. It's not a public list (and, given the traffic, I'm not convinced it's worth the hassle to make it a public one). However, the people who subscribe to that list know they are subscribed (they've asked for that explicitly), so no announcement seems to be necessary. I can subscribe you if you want. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
Eli Zaretskii [EMAIL PROTECTED] writes: Why can't you continue to use the MULE code and just change the character sets to reflect certain aspects of Unicode? The current plan for Unicode was discussed at length 3 years ago, and the result was what I described. Is the discussion archived somewhere, or are there some design documents which resulted from the discussion? I don't think it's wise for us to reopen that discussion again, unless you think the UTF-8-based representation is a terribly wrong design. Of course, it's hard to come up with constructive criticism when you don't know what's already there. ;-) So I don't see any reason for the unnamed Unicode people to get annoyed by a term they themselves coined. Me neither, but I got flamed in the past. :-/ Conceivably, changing the internal representation doesn't mean we need to rewrite all of the existing code, just the low-level parts of it that deal with code conversions (i.e. subroutines of encoding and decoding functions). I still don't understand the need for such a change. In theory, the internal representation of characters should be invisible to the higher levels. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
OD == Oliver Doepner [EMAIL PROTECTED] writes: OD There is vim 6.x now with full utf-8 support on the xterm. [Does `full utf-8 support' mean level 3?] Emacs can do utf-8 i/o under ttys that support it, though you don't _need_ such support -- either input or output -- to edit utf-8 text. OD It is much faster than emacs on x11 of course. I'm surprised that's much of an issue. I assume Emacs under X is much more capable. OD I was happy to see Emacs 21 announced. but the unicode support OD does not seem to have moved forward very much It's moved from zero to the state where it's perfectly fine for editing at least the Western technical text that interests me. E.g., Kuhn's UTF-8-demo.utf works modulo the level 2 text, for which one can add support straightforwardly at the Lisp level. It also allowed producing coding systems for all the 8-bit charsets for GNUish locales, which perhaps matters more in the wide world than utf-8 per se. With some customization, I can also at least _display_ utf-8-encoded CJK text. I can send and receive utf-8-encoded mail and browse utf-8-encoded web sites (with the development W3 package). The Mule-UCS package provides more if necessary, specifically better coverage of the BMP. OD Is the internal representation still the special MULE format ??~ Yes. So what? [There has been much mis-representation of Mule, some of it malicious.] There is a yet-unimplemented scheme for coverage up to U+10 within that encoding. Even now, with Lisp-level changes one could build an (incompatible) Emacs to cover the BMP, sacrificing some of the standard charsets. -- Bragging about Unicode support: ‘2d sinθ = nλ’ is plain text. ☺ URL:http://www.unicode.org/ - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
EZ == Eli Zaretskii [EMAIL PROTECTED] writes: EZ The problem is that characters are still not unified in Emacs 21. A package was contributed to do that for ISO 8859 characters. It's been posted to gnu.emacs.sources, so that shouldn't be an issue for anyone who's bothered by it. EZ So we have two versions of Cyrillic characters, two versions of EZ Greek characters, two versions of Hebrew characters, etc.: one EZ version in the new Unicode set, the other version in the old Mule EZ set. There are more than two, at least for Greek and Cyrillic. Those in the Far Eastern charsets could be unified too if anyone cared. This issue clearly doesn't apply only to the Unicode charsets, and, as a user, I don't think it's much of a problem in practice. EZ What can I say except ``volunteers are welcome...'' etc.? I can't EZ believe no one wants Unicode badly enough to work on its support in EZ Emacs, but what do I do with facts which fly in my face? That view is unfair to the people who have done lots of work, himi in particular. `Working on Unicode support' in my book isn't restricted to implementing an apparently-unnecessary, disruptive, incompatible change to the internal encoding, even if it's what one wants ideally. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
On Sun, Oct 28, 2001 at 05:04:22PM +, Dave Love wrote: OD == Oliver Doepner [EMAIL PROTECTED] writes: OD There is vim 6.x now with full utf-8 support on the xterm. [Does `full utf-8 support' mean level 3?] Well, it handles double-width characters as well as up to two combining characters. It's the only editor I've used (including Yudit) that could display the sequence U+0283 U+034D correctly. Emacs can do utf-8 i/o under ttys that support it, though you don't _need_ such support -- either input or output -- to edit utf-8 text. OD It is much faster than emacs on x11 of course. I'm surprised that's much of an issue. I assume Emacs under X is much more capable. Well, Emacs does have more features (including some that are less essential, such as doctor mode :), but vim has quite enough for most purposes. OD I was happy to see Emacs 21 announced. but the unicode support OD does not seem to have moved forward very much It's moved from zero to the state where it's perfectly fine for editing at least the Western technical text that interests me. E.g., Kuhn's UTF-8-demo.utf works modulo the level 2 text, for which one can add support straightforwardly at the Lisp level. It also allowed producing coding systems for all the 8-bit charsets for GNUish locales, which perhaps matters more in the wide world than utf-8 per se. With some customization, I can also at least _display_ utf-8-encoded CJK text. I can send and receive utf-8-encoded mail and browse utf-8-encoded web sites (with the development W3 package). Vim can display the UTF-8-demo file perfectly, with no exceptions. Also, although I haven't tested this, I am told it can write as well as display utf-8 CJK text. - Jimmy Kaplowitz [EMAIL PROTECTED] / [EMAIL PROTECTED] PGP signature
Re: unicode in emacs 21
On 28 Oct 2001, Dave Love wrote: EZ So we have two versions of Cyrillic characters, two versions of EZ Greek characters, two versions of Hebrew characters, etc.: one EZ version in the new Unicode set, the other version in the old Mule EZ set. There are more than two, at least for Greek and Cyrillic. Those in the Far Eastern charsets could be unified too if anyone cared. Full unification here would have the disadvantage that CJK Greek/Cyrillic characters are traditionally displayed as double-width, whereas ISO 8859/ISO 10646 Greek Cyrillic characters are traditionally displayed single-width. Some CJK users might be quite happy about a lack of unification here to preserve the display width of these characters. Same for the block graphics characters, which xterm with ISO10646 fonts displays single-width whereas kterm with JIS/etc. fonts displays in double-width. But surely all the European encodings such as ISO 8859, KOI, etc. should be urgently unified with Unicode. The relevant standards have already been (re)written to represent these encodings just as single-byte encodings of ISO 10646 subsets. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
On Thu, 25 Oct 2001, Eli Zaretskii wrote: Is the internal representation still the special MULE format ??~ Yes. But the internal representation is not the problem here; ideally, users and Lisp programs shouldn't be worrying about how characters are represented internally. The problem is that characters are still not unified in Emacs 21. Not entirely. Internal representation does matter somewhat when it comes to the handling of malformed UTF-8 sequences. I think it is highly desireable that the UTF-8 - emacs internal - UTF-8 conversion roundtrip is made 100% binary transparent. Loading and saving a file that contains malformed UTF-8 sequences should not change them, but character encoding conversions are prone to throw away information in the case of invalid source byte streams. Using UTF-8 as the internal Emacs encoding is one way of achieving continued guaranteed binary transparency, coming up with a tricky encoding for malformed UTF-8 sequences is another one. I favour the former approach, which is also what other UTF-8 capable modern editors do today. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
Eli Zaretskii [EMAIL PROTECTED] writes: Emacs cannot use a pure UTF-8 encoding, since some cultures don't want unification, and it was decided that Emacs should not force unification on those cultures. Why can't you continue to use the MULE code and just change the character sets to reflect certain aspects of Unicode? One such aspect is Latin unification, for example. (The Unicode people get very annoyed if you talk about unification, source separation rule etc. in the context of non-Han scripts...) In a second step, support for normalization, combining characters etc. would have to be added, but this could be based on the reliable foundation of the old MULE code. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
Followup to: [EMAIL PROTECTED] By author:Florian Weimer [EMAIL PROTECTED] In newsgroup: linux.utf8 Eli Zaretskii [EMAIL PROTECTED] writes: Emacs cannot use a pure UTF-8 encoding, since some cultures don't want unification, and it was decided that Emacs should not force unification on those cultures. Why can't you continue to use the MULE code and just change the character sets to reflect certain aspects of Unicode? One such aspect is Latin unification, for example. (The Unicode people get very annoyed if you talk about unification, source separation rule etc. in the context of non-Han scripts...) In a second step, support for normalization, combining characters etc. would have to be added, but this could be based on the reliable foundation of the old MULE code. Does that mean you're painting yourself into a corner, though, requiring manual work to integrate the increasingly Unicode-based infrastructure support that is becoming available? Odds are pretty good that they are. -hpa -- [EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private! Unix gives you enough rope to shoot yourself in the foot. http://www.zytor.com/~hpa/puzzle.txt[EMAIL PROTECTED] - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
H. Peter Anvin wrote: Does that mean you're painting yourself into a corner, though, requiring manual work to integrate the increasingly Unicode-based infrastructure support that is becoming available? Odds are pretty good that they are. Since I volunteered to help with this effort, I'd like to know what's already out there. I agree that duplicating functionality in the Emacs code that is already available from supported free libraries would be a bad idea unless there is a compelling reason. Of course, Emacs is buildable on most systems that have a working C compiler and a standard implementation of libc. Depending on anything else, unless it can be imported into the Emacs source tree would be a questionable idea. -- D. Dale Gulledge, Sr. Programmer, [EMAIL PROTECTED] C, C++, Perl, Unix (AIX, Linux), Oracle, Java, Internationalization (i18n), Awk. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
Eli Zaretskii [EMAIL PROTECTED] writes: | On Thu, 25 Oct 2001, Oliver Doepner wrote: | | my question: what happened in this area in Emacs 21 ?? | | What happened is that Emacs now supports Unicode characters that | basically span the BMP with the exception of CJK ideographic characters. | It also has some initial support for UTF-8. Note that the Mule-UCS package works fine with Emacs21 and allows you to send UTF8 mails with Gnus, for example. Andreas. -- Andreas Schwab And now for something [EMAIL PROTECTED] completely different. SuSE Labs, SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
I haven't meant for anything I've written to indicate that Emacs is not a useful editor for UTF-8 encoded text. I have found it quite usable. I've had a couple of configuration headaches along to way specifically because I am simultaneously maintaining files in both UTF-8 and Latin-3. If the alphabets you use fall within the ranges of characters that Emacs now handles, I can't see any strong argument not to use Emacs. I switched to the prereleases of Emacs 21 a few weeks ago specifically for the Unicode support. For me, there was really no option of choosing anything else, even if I had wanted to. I am doing some heavily customized stuff supported by a pile of Emacs Lisp code tailored to my data over the past 6 1/2 years. Emacs Lisp has saved me hundred of hours. In the end, I would like to see Emacs use Unicode internally. Oliver Doepner wrote: I was happy to see Emacs 21 announced. but the unicode support does not seem to have moved forward very much - as i have heard and read from some people. my question: what happened in this area in Emacs 21 ?? Is the internal representation still the special MULE format ??~ And are there any plans and/or activities to achieve these things ? -- D. Dale Gulledge, Sr. Programmer, [EMAIL PROTECTED] C, C++, Perl, Unix (AIX, Linux), Oracle, Java, Internationalization (i18n), Awk. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
On Thu, 25 Oct 2001, Oliver Doepner wrote: my question: what happened in this area in Emacs 21 ?? What happened is that Emacs now supports Unicode characters that basically span the BMP with the exception of CJK ideographic characters. It also has some initial support for UTF-8. Is the internal representation still the special MULE format ??~ Yes. But the internal representation is not the problem here; ideally, users and Lisp programs shouldn't be worrying about how characters are represented internally. The problem is that characters are still not unified in Emacs 21. So we have two versions of Cyrillic characters, two versions of Greek characters, two versions of Hebrew characters, etc.: one version in the new Unicode set, the other version in the old Mule set. And Emacs thinks these are different characters, so if you mix them without converting them, you are in trouble. And are there any plans and/or activities to achieve these things ? Oh, we have plenty of plans! The problem is with volunteers who would step forward and actually produce some code that implements those plans. It might come as a surprise to some that the decision to change the internal representation of characters to something that is based on Unicode and that unifies the characters--that decision was made several years ago (beginning of 1998, to be exact). At that time, discussions were held which produced a detailed design of the new representation. What remains is for few motivated individuals to sit down and code the darn thing. Which is where we are today, more than 3 years later. Lately, the emacs-unicode mailing list was revived, in the hope that it will boost the activity. Sadly, the traffic on that list is nil. What can I say except ``volunteers are welcome...'' etc.? I can't believe no one wants Unicode badly enough to work on its support in Emacs, but what do I do with facts which fly in my face? - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/