RE: UTF-8 and UTF-16 issues

2000-06-29 Thread Joe_Ross
coding. Joe "Jones, Bob" <[EMAIL PROTECTED]> on 06/28/2000 04:56:16 PM To: "Unicode List" <[EMAIL PROTECTED]> cc: (bcc: Joe Ross/Tivoli Systems) Subject: RE: UTF-8 and UTF-16 issues Has anyone out there taken a cross platform non-Unicode enabled legacy ap

RE: UTF-8 and UTF-16 issues

2000-06-28 Thread Jones, Bob
s NCHAR(10) on SQL Server, CHAR(30) on Oracle, and CHAR(20) on DB2/400. Thanks, Bob Jones [EMAIL PROTECTED] -Original Message- From: Edward Cherlin [mailto:[EMAIL PROTECTED]] Sent: Sunday, June 25, 2000 7:01 PM To: Unicode List Subject: Re: UTF-8 and UTF-16 issues At 2:48 PM -0800 6/19/

Re: UTF-8 and UTF-16 issues

2000-06-25 Thread Edward Cherlin
At 2:48 PM -0800 6/19/00, Markus Scherer wrote: >"OLeary, Sean (NJ)" wrote: > > UTF-16 is the 16-bit encoding of Unicode that includes the use of > > surrogates. This is essentially a fixed width encoding. > >certainly not. utf-16, of course, is variable-width: 1 or 2 16-bit >units per character.

Re: UTF-8 and UTF-16 issues

2000-06-20 Thread John Cowan
john wrote: > So, then, is UTF-32 fixed-width, or must we aim for a UTF-128 > or some such, to end this kind of kludge? Nope, the 21-bit characters of UTF-32 are sufficient forever. But a user-visible "character" may contain any number of diacritical marks, each of which may require its own 32-b

Re: UTF-8 and UTF-16 issues

2000-06-20 Thread Tony Graham
At 19 Jun 2000 19:03 -0800, Tony Graham wrote: > According to Appendix F, Autodetection of Character Encodings > (Non-Normative), beginning a parsed entity with the UTF-8 BOM counts > as: > >other: UTF-8 without an encoding declaration, or else the data >stream is corrupt, fragmenta

Re: UTF-8 and UTF-16 issues

2000-06-19 Thread john
>> "OLeary, Sean (NJ)" wrote: >> UTF-16 is the 16-bit encoding of Unicode that includes the use of >> surrogates. This is essentially a fixed width encoding. > certainly not. utf-16, of course, is variable-width: 1 or 2 16-bit units per > character. certainly the iuc discussion did not spread

Re: UTF-8 and UTF-16 issues

2000-06-19 Thread Tony Graham
At 19 Jun 2000 14:48 -0800, Markus Scherer wrote: > > the BOM was intended to be used in 16-bit encodings like UTF-16, not in > > UTF-8. > > it is still useful to use the signature byte sequences in all > unicode encodings. the xml spec, for example lists them as a help > for the parser. if

Re: UTF-8 and UTF-16 issues

2000-06-19 Thread Markus Scherer
"OLeary, Sean (NJ)" wrote: > UTF-16 is the 16-bit encoding of Unicode that includes the use of > surrogates. This is essentially a fixed width encoding. certainly not. utf-16, of course, is variable-width: 1 or 2 16-bit units per character. certainly the iuc discussion did not spread this under

UTF-8 and UTF-16 issues

2000-06-16 Thread OLeary, Sean (NJ)
The following is from a document I had put together following the last San José Unicode conference. I would be interested in writing a more complete document with more issues added. Please send me any recommendations you might have. Sean =