Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)

2001-02-21 Thread Joel Rees
Hi. I took several minutes to scan through your post and I am not sure what you are asking. Would you like to see some examples, for instance, of real (assigned) code points that require encoding by surrogate pairs to be represented as Java char? Looking at what you are trying to do, I think I

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-21 Thread Marco Cimarosti
Peter Constable: On 02/20/2001 03:34:28 AM Marco Cimarosti wrote: "Unicode is now a 32-bit character encoding standard, although only about one million of codes actually exist, [...] Well, it's probably a better answer to say that Unicode is a 20.1-bit encoding since the direct

Unicode terminology (RE: Perception that Unicode is 16-bit (was:

2001-02-21 Thread Marco Cimarosti
John Hudson wrote: (In French, sans serif is normally named "antique" Which must be very confusing to Germans and others who use 'antiqua' to distinguish seriffed humanists types from blackletter. Antoine Leca replied: And you do believe that Frenchies are _not_ confused by the fact

Re: Perception that Unicode is 16-bit (was: Re: Surrogate

2001-02-21 Thread Antoine Leca
John Hudson wrote: At 09:05 AM 2/20/2001 -0800, Antoine Leca wrote: (In French, sans serif is normally named "antique" Which must be very confusing to Germans and others who use 'antiqua' to distinguish seriffed humanists types from blackletter. And you do believe that Frenchies

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-21 Thread Michael Everson
At 21:25 -0800 2001-02-20, [EMAIL PROTECTED] wrote: And perhaps the Mac people think of MacRoman as "8-bit ASCII." No, we think of it as Mac Roman. -- Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie 15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland Mob +353 86

Re: New BMP characters (was Re: [very OT] Documentation: beyond

2001-02-21 Thread Thomas Chan
On Wed, 21 Feb 2001, Werner LEMBERG wrote: Section 10.1 of PDUTR #27 "Unicode 3.1" (2000.1.17) gives the sources of the 42,711 new characters as: ... CNS 11643-1992, 15th plane Really? I thought this should be CNS 11643-1986. I think there isn't a 15th plane in the 1992

RE: 8-bit ASCII

2001-02-21 Thread Hart, Edwin F.
IETF defines this as part of one of the very early RFCs on SMTP, FTP, or TELNET but has not defined it in a separate RFC. Essentially, it is 7-bit ASCII (ANSI X3.4-1986) but the IETF may not include the ASCII control characters. Ed Hart Edwin F. Hart [EMAIL PROTECTED] The Johns Hopkins

[OT] What is DEL for?

2001-02-21 Thread Marco Cimarosti
What is the function of ASCII control code 0x7F (DEL) in text interchange? Particularly, what effect or interpretation might it have in communication protocols, terminal protocols and, especially, inside text files? My interest is about the function of this character in *contemporary* platforms

Re: Implementing Complex Unicode Scripts

2001-02-21 Thread Antoine Leca
Apurva Joshi va escriure: Re: "Uniscribe is just an implementation of these specifications, and I hope sincerely Microsoft will not hide some "features" into USP10.DLL in order to kill any concurrence." The process of adding new feature support to Uniscribe is not unlike adding newer

OT Unicode terminology

2001-02-21 Thread pandries
Marco Cimarosti wrote: With all that I really don't envy people who, like Patrick Andries, have undertaken the "impossible" task of translating the Unicode documentation into another language, and I look with sympathy at their requests for proof-reading... Thank you for the kind

Re: [OT] What is DEL for?

2001-02-21 Thread Valeriy E. Ushakov
On Wed, Feb 21, 2001 at 06:29:29 -0800, Marco Cimarosti wrote: What is the function of ASCII control code 0x7F (DEL) in text interchange? Particularly, what effect or interpretation might it have in communication protocols, terminal protocols and, especially, inside text files? My

Re: New BMP characters (was Re: [very OT] Documentation: beyond

2001-02-21 Thread Jungshik Shin
On Wed, 21 Feb 2001, Werner LEMBERG wrote: South Korea's PKS 5700 This is a North Korean standard AFAIK. No. AFAIK, PKS stands for 'Proposed Korean Standard' and as such PKS 5700 became KS C 5700 which in turn was renamed KS X 1005-1. Then, what is KS X 1005-1? It's just the Korean

Re: [OT] What is DEL for?

2001-02-21 Thread DougEwell2
In a message dated 2001-02-21 07:03:46 Pacific Standard Time, [EMAIL PROTECTED] writes: What is the function of ASCII control code 0x7F (DEL) in text interchange? Particularly, what effect or interpretation might it have in communication protocols, terminal protocols and, especially,

Re: [OT] What is DEL for?

2001-02-21 Thread Frank da Cruz
Which systems interpret 0x7F as "interrupt process"? I know that this would be 0x03 in DOS (^C), and 0x03, 0x04 or 0x1A in Unix (^C, ^D, and ^Z, respectively), but I know nothing about other systems, e.g. Macintosh. Very long ago, in the Seventh Edition of Unix, the default interrupt

Re: New BMP characters (was Re: [very OT] Documentation: beyond

2001-02-21 Thread Thomas Chan
On Wed, 21 Feb 2001, Jungshik Shin wrote: On Wed, 21 Feb 2001, Jungshik Shin wrote: On Wed, 21 Feb 2001, Werner LEMBERG wrote: South Korea's PKS 5700 This is a North Korean standard AFAIK. No. AFAIK, PKS stands for 'Proposed Korean Standard' and as such PKS 5700 became KS C 5700

Re: New BMP characters (was Re: [very OT] Documentation: beyond

2001-02-21 Thread Jungshik Shin
On Wed, 21 Feb 2001, Jungshik Shin wrote: On Wed, 21 Feb 2001, Werner LEMBERG wrote: South Korea's PKS 5700 This is a North Korean standard AFAIK. No. AFAIK, PKS stands for 'Proposed Korean Standard' and as such PKS 5700 became KS C 5700 which in turn was renamed KS X 1005-1.

Re: [OT] What is DEL for?

2001-02-21 Thread John Cowan
Marco Cimarosti wrote: Which systems interpret 0x7F as "interrupt process"? I know that this would be 0x03 in DOS (^C), and 0x03, 0x04 or 0x1A in Unix (^C, ^D, and ^Z, respectively), but I know nothing about other systems, e.g. Macintosh. Very long ago, in the Seventh Edition of Unix, the

Re: [OT] What is DEL for?

2001-02-21 Thread Valeriy E. Ushakov
On Wed, Feb 21, 2001 at 09:42:53 -0800, Marco Cimarosti wrote: 1) What happens if emacs loads Doug Ewell's text file (I.e. a text file containing "ABCdelDEF") and then saves it? Would the file's content be changed to "ABDEF"? No. I don't think any program interprets file contents in this

Re: New BMP characters (was Re: [very OT] Documentation: beyond

2001-02-21 Thread Thomas Chan
On Wed, 21 Feb 2001, Werner LEMBERG wrote: Section 10.1 of PDUTR #27 "Unicode 3.1" (2000.1.17) gives the sources of the 42,711 new characters as: ... CNS 11643-1992, 15th plane Really? I thought this should be CNS 11643-1986. I think there isn't a 15th plane in the 1992

Re: New BMP characters (was Re: [very OT] Documentation: beyond

2001-02-21 Thread Thomas Chan
On Wed, 21 Feb 2001, Jungshik Shin wrote: On Wed, 21 Feb 2001, Thomas Chan wrote: The unihan.txt file ver 3.0b1 (1999.7.2) lists four K- sources as: K0 KS C 5601-1987 K1 KS C 5657-1991 K2 PKS C 5700-1 1994 K3 PKS C 5700-2 1994 It's very clear what K0 and K1 are, and they

Re: Scripts and alphabets (was: Perception ...)

2001-02-21 Thread Peter_Constable
John Cowan wrote: As I take it, you mean that the Bulgarian alphabet consists of those letters of the Cyrillic script that are necessary and customary in writing Bulgarian... Am I reading you correctly? Yes. - Peter

Re: [OT] What is DEL for?

2001-02-21 Thread John Cowan
Frank da Cruz wrote: DEL does indeed have a use in plain text files that are encoded with Shift-In / Shift-Out to switch between left and right halves of (say) ISO 8859-1 without having to actually put 8-bit characters in the file. Ditto for "higher" levels of ISO-2022 character-set

Re: Korean Line breaking and Editorial Tracking

2001-02-21 Thread Jungshik Shin
On Tue, 13 Feb 2001, Julie Doll Allen wrote: Julie, Thank you for your kind answer. Just to let you know, Ken Whistler and I both filed away your comments and those of others about p. 124. As the editor for 4.0, I've flagged that passage for discussion in the editorial committee once we

Re: [OT] What is DEL for?

2001-02-21 Thread John Cowan
Marco Cimarosti wrote: What is the function of ASCII control code 0x7F (DEL) in text interchange? Particularly, what effect or interpretation might it have in communication protocols, terminal protocols and, especially, inside text files? In general it has none. Some systems interpret it

CJK workers, throw off you chains!

2001-02-21 Thread Kenneth Whistler
The Unihan.txt file for Unicode 3.1 has finally arrived! You can go to the beta update directory location to get your copy: http://www.unicode.org/Public/3.1-Update/ (Sorry, but ftp access is still broken. We'll have all the data files mirrored for ftp access in the near future, but not just

Re: New BMP characters (was Re: [very OT] Documentation: beyond

2001-02-21 Thread Jungshik Shin
On Wed, 21 Feb 2001, Thomas Chan wrote: On Wed, 21 Feb 2001, Jungshik Shin wrote: On Wed, 21 Feb 2001, Jungshik Shin wrote: On Wed, 21 Feb 2001, Werner LEMBERG wrote: South Korea's PKS 5700 This is a North Korean standard AFAIK. No. AFAIK, PKS stands for 'Proposed Korean

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-21 Thread Joel Rees
Marco: Would you mind if I re-post my reply that I forget to cc to the list? --- missing post What exactly _would_ be wrong with calling UNICODE a thirty-two bit encoding with a couple of ways to represent most of the characters in a smaller

Plain text in Java ResourceBundle

2001-02-21 Thread Richard, Francois M
Related to the "clear" identification of plain text: My group is trying to convince developers to implement Unicode in their systems. So, one of our first task is to identify "plain text" in their systems so that we can understand the implication and requirements for implementing Unicode. A

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-21 Thread Tom Lord
What exactly _would_ be wrong with calling UNICODE a thirty-two bit encoding If I have a 32 bit integer type, holding a Unicode code point, I have 11 bits left over to hold other data. That's worth knowing. Btw, saying approximately 20.087 bits (Am I calculating that

Re: More rambling about Han

2001-02-21 Thread Joel Rees
Hi Thomas, I am just a newby making noise and otherwise being obnoxious. I had forgotten to cc the intermediate message to the mailing list, and didn't realize it until after I posted my reply with most of Ken Whistler's reply clipped. I'll waste even more bandwidth and paste the intermediates

An Aburdly Brief Introduction to Unicode (was Re: Perception ...)

2001-02-21 Thread Tom Lord
We've seen several posts about the perception that Unicode is a 16 bit character set encoding. Among those, we've heard anecdotes about the problems people have introducing newcomers to Unicode. Here is a chapter of a reference manual I've been working on. The original manual can be found at

fictional scripts revisited

2001-02-21 Thread Thomas Chan
Hi all, Between January 30-31, there was a thread here entitled "ConScript registry?", in which I mentioned[1] the possibility of non-Western fictional scripts gobbling up codepoints, where I gave two example .jpg files of the kinds of Chinese fictional scripts that exist. Whether those

Inverted breve in Greek?

2001-02-21 Thread Sen Saghdha
Since there seem to be some people here who know about something about Greek diacritics, I'm hoping someone here will be able to help me. I know very little about Greek, as will probably become clear. I'm making a Unicode version of an ASCII representation of an etymological dictionary,