coding.
Joe
"Jones, Bob" <[EMAIL PROTECTED]> on 06/28/2000 04:56:16 PM
To: "Unicode List" <[EMAIL PROTECTED]>
cc: (bcc: Joe Ross/Tivoli Systems)
Subject: RE: UTF-8 and UTF-16 issues
Has anyone out there taken a cross platform non-Unicode enabled legacy
ap
s NCHAR(10) on SQL Server, CHAR(30) on Oracle, and
CHAR(20) on DB2/400.
Thanks,
Bob Jones
[EMAIL PROTECTED]
-Original Message-
From: Edward Cherlin [mailto:[EMAIL PROTECTED]]
Sent: Sunday, June 25, 2000 7:01 PM
To: Unicode List
Subject: Re: UTF-8 and UTF-16 issues
At 2:48 PM -0800 6/19/
At 2:48 PM -0800 6/19/00, Markus Scherer wrote:
>"OLeary, Sean (NJ)" wrote:
> > UTF-16 is the 16-bit encoding of Unicode that includes the use of
> > surrogates. This is essentially a fixed width encoding.
>
>certainly not. utf-16, of course, is variable-width: 1 or 2 16-bit
>units per character.
john wrote:
> So, then, is UTF-32 fixed-width, or must we aim for a UTF-128
> or some such, to end this kind of kludge?
Nope, the 21-bit characters of UTF-32 are sufficient forever.
But a user-visible "character" may contain any number of diacritical
marks, each of which may require its own 32-b
At 19 Jun 2000 19:03 -0800, Tony Graham wrote:
> According to Appendix F, Autodetection of Character Encodings
> (Non-Normative), beginning a parsed entity with the UTF-8 BOM counts
> as:
>
>other: UTF-8 without an encoding declaration, or else the data
>stream is corrupt, fragmenta
>> "OLeary, Sean (NJ)" wrote:
>> UTF-16 is the 16-bit encoding of Unicode that includes the use of
>> surrogates. This is essentially a fixed width encoding.
> certainly not. utf-16, of course, is variable-width: 1 or 2 16-bit units per
> character. certainly the iuc discussion did not spread
At 19 Jun 2000 14:48 -0800, Markus Scherer wrote:
> > the BOM was intended to be used in 16-bit encodings like UTF-16, not in
> > UTF-8.
>
> it is still useful to use the signature byte sequences in all
> unicode encodings. the xml spec, for example lists them as a help
> for the parser. if
"OLeary, Sean (NJ)" wrote:
> UTF-16 is the 16-bit encoding of Unicode that includes the use of
> surrogates. This is essentially a fixed width encoding.
certainly not. utf-16, of course, is variable-width: 1 or 2 16-bit units per
character. certainly the iuc discussion did not spread this under
The following is from a document I had put together following the last San
José Unicode conference. I would be interested in writing a more complete
document with more issues added. Please send me any recommendations you
might have.
Sean
=
9 matches
Mail list logo