Re: Abnormal Bytes and Unicode: (was Re: Unicode FAQ addendum)

2000-07-24 Thread Torsten Mohrin
Kenneth Whistler [EMAIL PROTECTED] wrote: So the first step to interoperability in big, interconnected system software using C is to set up fundamental header files containing well-defined datatypes of fixed sizes, to make up for the lack of same in the definition of C itself. The lack of

Re: Unicode FAQ addendum

2000-07-23 Thread Paul Keinänen
At 16.10 22.7.2000 -0800, jgo wrote: Addison wrote: 1. 1 byte != 1 character: deal with it. Hmm, depends on how you define "byte". I've seen them in 8-bit, 12-bit, 16-bit and 18-bit varieties. The trouble, though, is that 1 character (in this context) can be represented by from 16 bits to

Octet vs byte (was Unicode FAQ Addendum)

2000-07-22 Thread Patrick Andries
- Message d'origine - De : "Doug Ewell" [EMAIL PROTECTED] À : "Unicode List" [EMAIL PROTECTED] Envoyé : 22 juillet, 2000 21:24 Objet : Re: Unicode FAQ addendum John G. Otto, alias "jgo" [EMAIL PROTECTED], wrote: Addison wrote: 1. 1 byte != 1 ch

Re: Unicode FAQ addendum

2000-07-21 Thread John Cowan
1) Unicode code units are not 8 bits long; deal with it. Joe How about "1) Unicode characters don't fit in 8 bits; deal with it." "Code units" isn't really in the spirit of JOVUC. -- John Cowan [EMAIL PROTECTED] C'est la` pourtant que se livre le sens du

RE: Unicode FAQ addendum

2000-07-21 Thread Marco . Cimarosti
1) The UTF whose bits can be counted is not the eternal UTF. The encoding that is not in UTR-17 is not a compliant encoding. UCS-2 is the origin of the BMP. UTF-16 is the origin of 1,048,576 more code points. Therefore, constantly use UTF-8 and you'll see the mystery on your mail

Re: Unicode FAQ addendum

2000-07-21 Thread John Cowan
Jonathan Rosenne wrote: 2) Byte order is only an issue in I/O. I accept this change. -- Schlingt dreifach einen Kreis um dies! || John Cowan [EMAIL PROTECTED] Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, ||

Re: Unicode FAQ addendum

2000-07-20 Thread Lars Marius Garshol
* John Cowan | | C1 says "A process shall interpret Unicode code values as 16-bit | quantities." This I find mightily confusing. Why say something like this when there are (well, will be) characters that cannot be represented with 16 bits in any of the Unicode encodings? | "Code unit" is

Re: Unicode FAQ addendum

2000-07-20 Thread Asmus Freytag
There's no updating needed. The key is that The Unicode Standard, Version 3.0 recognizes UTF-16 as the default encoding. Therefore code values (or units) which are defined as 'minimal bit combination that can represent a unit of encoded text' are 16-bit. In UTF-16, one sometimes needs two of

Re: Unicode FAQ addendum

2000-07-20 Thread Elliotte Rusty Harold
At 8:00 AM -0800 7/19/00, John Cowan wrote: The new Unicode FAQ (like the old) supplies the panting world with John's Own Version of Unicode Conformance: 1) Unicode code units are 16 bits long; deal with it. 2) Byte order is only an issue in files. I've got to take issue with #2. People can and

RE: Unicode FAQ addendum

2000-07-20 Thread Jonathan Rosenne
How about: 2) Byte order is only an issue in I/O. Jony -Original Message- From: Elliotte Rusty Harold [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 20, 2000 3:19 PM To: Unicode List Subject: Re: Unicode FAQ addendum At 8:00 AM -0800 7/19/00, John Cowan wrote

Re: Unicode FAQ addendum

2000-07-20 Thread Doug Ewell
| C1 says "A process shall interpret Unicode code values as 16-bit | quantities." I think the focus here was supposed to be on the fact that Unicode code values are *not 8-bit* quantities. I found out about Unicode in late 1991 when I discovered a copy of TUS 1.0 in a bookstore, and for years

RE: Unicode FAQ addendum

2000-07-20 Thread Becker, Joseph
| C1 says "A process shall interpret Unicode code values as 16-bit | quantities." DE I think the focus here was supposed to be on the fact that Unicode code DE values are *not 8-bit* quantities. This may be the path to an update that is pithy yet true. The original mantra, paraphrased in C1

Re: Unicode FAQ addendum

2000-07-20 Thread Markus Scherer
Becker, Joseph wrote: terminology in an informal statement, I wouldn't have a problem with the simple update: 1) Unicode code units are not 8 bits long; deal with it. how about: 1) Unicode code units are not necessarily 8 bits long [wide], code points use 21 bits; deal with it. rationale:

Re: Unicode FAQ addendum

2000-07-20 Thread Mark Davis
Narrowing in on it, with one amendation. UTF-8 code units are 8 bits, so we can't say that. Mark Becker, Joseph wrote: | C1 says "A process shall interpret Unicode code values as 16-bit | quantities." DE I think the focus here was supposed to be on the fact that Unicode code DE values are

Unicode FAQ addendum

2000-07-19 Thread John Cowan
The new Unicode FAQ (like the old) supplies the panting world with John's Own Version of Unicode Conformance: 1) Unicode code units are 16 bits long; deal with it. 2) Byte order is only an issue in files. 3) If you don't have a clue, assume big-endian. 4) Loose surrogates don't mean jack. 5)

Re: Unicode FAQ addendum

2000-07-19 Thread Markus Scherer
John Cowan wrote: The new Unicode FAQ (like the old) supplies the panting world with John's Own Version of Unicode Conformance: some of the old ones seem to be pre-unicode 1.1. should they not be updated? 1) Unicode code units are 16 bits long; deal with it. this is true for the default

Re: Unicode FAQ addendum

2000-07-19 Thread John Cowan
Markus Scherer wrote: some of the old ones seem to be pre-unicode 1.1. should they not be updated? No, they are 2.0. 1) Unicode code units are 16 bits long; deal with it. C1 says "A process shall interpret Unicode code values as 16-bit quantities." "Code unit" is defined in definition D5