Kenneth Whistler [EMAIL PROTECTED] wrote:
So the first step to interoperability in big, interconnected system
software using C is to set up fundamental header files containing
well-defined datatypes of fixed sizes, to make up for the lack of same
in the definition of C itself. The lack of
At 16.10 22.7.2000 -0800, jgo wrote:
Addison wrote:
1. 1 byte != 1 character: deal with it.
Hmm, depends on how you define "byte".
I've seen them in 8-bit, 12-bit, 16-bit and 18-bit varieties.
The trouble, though, is that 1 character (in this context)
can be represented by from 16 bits to
- Message d'origine -
De : "Doug Ewell" [EMAIL PROTECTED]
À : "Unicode List" [EMAIL PROTECTED]
Envoyé : 22 juillet, 2000 21:24
Objet : Re: Unicode FAQ addendum
John G. Otto, alias "jgo" [EMAIL PROTECTED], wrote:
Addison wrote:
1. 1 byte != 1 ch
1) Unicode code units are not 8 bits long; deal with it.
Joe
How about "1) Unicode characters don't fit in 8 bits; deal with it."
"Code units" isn't really in the spirit of JOVUC.
--
John Cowan [EMAIL PROTECTED]
C'est la` pourtant que se livre le sens du
1) The UTF whose bits can be counted is not the eternal UTF.
The encoding that is not in UTR-17 is not a compliant encoding.
UCS-2 is the origin of the BMP.
UTF-16 is the origin of 1,048,576 more code points.
Therefore, constantly use UTF-8 and you'll see the mystery on your mail
Jonathan Rosenne wrote:
2) Byte order is only an issue in I/O.
I accept this change.
--
Schlingt dreifach einen Kreis um dies! || John Cowan [EMAIL PROTECTED]
Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com
Denn er genoss vom Honig-Tau, ||
* John Cowan
|
| C1 says "A process shall interpret Unicode code values as 16-bit
| quantities."
This I find mightily confusing. Why say something like this when
there are (well, will be) characters that cannot be represented with
16 bits in any of the Unicode encodings?
| "Code unit" is
There's no updating needed. The key is that The Unicode Standard, Version
3.0 recognizes UTF-16 as the default encoding. Therefore code values (or
units) which are defined as 'minimal bit combination that can represent a
unit of encoded text' are 16-bit. In UTF-16, one sometimes needs two of
At 8:00 AM -0800 7/19/00, John Cowan wrote:
The new Unicode FAQ (like the old) supplies the panting world with
John's Own Version of Unicode Conformance:
1) Unicode code units are 16 bits long; deal with it.
2) Byte order is only an issue in files.
I've got to take issue with #2. People can and
How about:
2) Byte order is only an issue in I/O.
Jony
-Original Message-
From: Elliotte Rusty Harold [mailto:[EMAIL PROTECTED]]
Sent: Thursday, July 20, 2000 3:19 PM
To: Unicode List
Subject: Re: Unicode FAQ addendum
At 8:00 AM -0800 7/19/00, John Cowan wrote
| C1 says "A process shall interpret Unicode code values as 16-bit
| quantities."
I think the focus here was supposed to be on the fact that Unicode code
values are *not 8-bit* quantities. I found out about Unicode in late
1991 when I discovered a copy of TUS 1.0 in a bookstore, and for years
| C1 says "A process shall interpret Unicode code values as 16-bit
| quantities."
DE I think the focus here was supposed to be on the fact that Unicode code
DE values are *not 8-bit* quantities.
This may be the path to an update that is pithy yet true. The original
mantra, paraphrased in C1
Becker, Joseph wrote:
terminology in an informal statement, I wouldn't have a problem with the
simple update:
1) Unicode code units are not 8 bits long; deal with it.
how about:
1) Unicode code units are not necessarily 8 bits long [wide], code points use 21 bits;
deal with it.
rationale:
Narrowing in on it, with one amendation. UTF-8 code units are 8 bits, so we
can't say that.
Mark
Becker, Joseph wrote:
| C1 says "A process shall interpret Unicode code values as 16-bit
| quantities."
DE I think the focus here was supposed to be on the fact that Unicode code
DE values are
The new Unicode FAQ (like the old) supplies the panting world with
John's Own Version of Unicode Conformance:
1) Unicode code units are 16 bits long; deal with it.
2) Byte order is only an issue in files.
3) If you don't have a clue, assume big-endian.
4) Loose surrogates don't mean jack.
5)
John Cowan wrote:
The new Unicode FAQ (like the old) supplies the panting world with
John's Own Version of Unicode Conformance:
some of the old ones seem to be pre-unicode 1.1. should they not be updated?
1) Unicode code units are 16 bits long; deal with it.
this is true for the default
Markus Scherer wrote:
some of the old ones seem to be pre-unicode 1.1. should they not be updated?
No, they are 2.0.
1) Unicode code units are 16 bits long; deal with it.
C1 says "A process shall interpret Unicode code values as 16-bit quantities."
"Code unit" is defined in definition D5
17 matches
Mail list logo