UTF-8

2001-06-12 Thread Bill Kurmey
When folks tell me not to worry as in my previous query, then I really do get worried and try not to be hurried. (last clause not of semantic intent but scans well.) 8-) So a few points of clarification please. Will the Unicode version of UTF-8 be registered with IANA and, if so, what will

Re: UTF-16 problems

2001-06-12 Thread DougEwell2
In a message dated 2001-06-11 21:46:38 Pacific Daylight Time, [EMAIL PROTECTED] writes: Shouldn't a war about UTF-8 be discussed on Unicore? Please, don't excommunicate us non-members from the discussion by restricting it to the members-only unicoRe list. We have something to contribute

RE: UTF-16 problems

2001-06-12 Thread Shigemichi Yazawa
At Mon, 11 Jun 2001 15:43:42 -0700, Carl W. Brown [EMAIL PROTECTED] wrote: I first I thought the same thing but I have changed my mind. There are problems but the problems are with UTF-16 not UTF-8. I don't think your new UTF-16 propesal solves any problem. It's yet another encoding. It won't

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Peter_Constable
On 06/11/2001 10:45:46 PM Mark Davis wrote: [earlier] - Oracle could probably make a case for their name for UTF8 simply being an anachronism. After all, the original definition of UTF-8 did convert surrogate pairs as they are doing in what they call UTF8. [now] UTF-8 was defined before

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Peter_Constable
In other words, Oracle has an alternate solution here for 9i -- they can simply explain that the old product defined the old pre-surrogate UTF-8 and the new product is now surrogate aware and uses the current definition. There's a mistake being made here that has been made repeatedly throughout

Re: UTF-8

2001-06-12 Thread toby_phipps
Will the Unicode version of UTF-8 be registered with IANA and, if so, what will be its charset designation? Firstly, please let's halt the confusion right here. I'll repeat myself once more. We're not redefining UTF-8. What we're proposing is not a Unicode version of UTF-8. It is another,

RE: UTF-8 Syntax

2001-06-12 Thread toby_phipps
Marco Cimarosti [EMAIL PROTECTED] wrote: My assumption was that, in the first case (no sort order requested by the client), a server could in theory provide a result set randomly shuffled. Of course, I know that this won't normally happen but, however, the server is allowed to provide whatever

Re: Missing characters for Italian

2001-06-12 Thread Michael Everson
At 18:43 -0700 2001-06-11, Rick McGowan wrote: Everson wrote: Lots of people with names like McGowan like to have the c, ostensibly an abbreviation for ac superscripted and underlined. ;-) (Sound of wretching...) You mean Ack!? Uh, no. I like it just fine as-is. If I actually spelled my

RE: Missing characters for Italian

2001-06-12 Thread Marco Cimarosti
Antoine Leca shcrissi (Sicilian, this time): Marco Cimarosti écrivit (!): That is true. It is as true as the fact that when we French are to write the oe digraph, we *type* it as two separate letters, for lack of better solutions. The two issues are quite different. - The lack of French oe

RE: Missing characters for Italian

2001-06-12 Thread Marco Cimarosti
Peter Constable wrote: The point is that encodings currently used for French have none of these. Well, then, just do what the French do: don't use any of them, even though you may be tempted to use some. [...] The ideal for me, rather than adding the missing e and i, would be to

Re: New acquisition

2001-06-12 Thread James E. Agenbroad
Tuesday, June 12, 2001 Did the Lion dip his thorn in ink? Jim Agenbroad (discalimer and addresses at bottom) On Mon, 11 Jun 2001, John Hudson wrote: At 15:56 6/11/2001 +0100, Michael Everson wrote: Shaw, Bernard. 1962. Androcles the

Fw: Undeliverable: Re: UTF8 vs AL32UTF8

2001-06-12 Thread Mark Davis
- Original Message - From: System Administrator [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, June 11, 2001 21:41 Subject: Undeliverable: Re: UTF8 vs AL32UTF8 Your message To: Misha Wolf Cc: [EMAIL PROTECTED] Subject: Re: UTF8 vs AL32UTF8 Sent:Tue,

Re: xICU 3.0 Status - (Simplified Unicode Implementation)

2001-06-12 Thread Mark Davis
We will respond more fully later, but I want to make it very clear that despite the unfortunate and confusing choice of name, xICU is not connected to the ICU product or team in any way. Mark - Original Message - From: Bill Kurmey [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday,

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Peter_Constable
On 06/12/2001 10:29:26 AM Mark Davis wrote: When applying UTF-8 -- as originally designed -- the sequence D800 DC00 would transform into a 6-byte sequence. Transforming back would result with the original sequence D800 DC00. When applying this to Unicode (16 bit only, at the

RE: Missing characters for Italian

2001-06-12 Thread Peter_Constable
What do you plan to propose for phonetic modifier letters supa, supo and supi: 1) Will you propose three new code points? 2) will you propose to unify them with U+00AA, U00BA and U+2071? If I were to propose new code points, the only differences might be between Ll and Lm, and that 00AA and

Re: UTF8 vs AL32UTF8

2001-06-12 Thread DougEwell2
In a message dated 2001-06-12 1:07:17 Pacific Daylight Time, [EMAIL PROTECTED] writes: There's a mistake being made here that has been made repeatedly throughout our discussion: that's to assume that there are two kinds of UTF-8: the original, in which the code unit sequence ED A0 80 ED

Re: UTF-8

2001-06-12 Thread Markus Scherer
Bill Kurmey wrote: Will the Unicode version of UTF-8 be registered with IANA and, if so, what will be its charset designation? I believe this question is based on a misunderstanding: 6-byte sequences have been mentioned in this discussion. The intended meaning was pairs of 3-byte sequences

RE: UTF-16 problems

2001-06-12 Thread Carl W. Brown
Toby, I agree that there is a need to preserve standards. Oracle did not support surrogates. If you passed it a UTF-16 data stream it would not be converted into proper UTF-8 encoding. At this juncture it should have fixed UTF8. This would have worked with the old data because it had no

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Mark Davis
That would be viewing history in the prism of present thought. When applying UTF-8 -- as originally designed -- the sequence D800 DC00 would transform into a 6-byte sequence. Transforming back would result with the original sequence D800 DC00. When applying this to Unicode (16

RE: xICU 3.0 Status - (Simplified Unicode Implementation)

2001-06-12 Thread Carl W. Brown
Bill, This product is not developed by the ICU development team. We at X.Net are making this code available for people who are interested in implementing ICU. This is designed to simplify ICU implementation if you choose to use it. Even if you use it you can always invoke ICU directly. I

RE: xICU 3.0 Status - (Simplified Unicode Implementation)

2001-06-12 Thread Carl W. Brown
Mark, I though hard about the name. I thought that maybe it should be totally different. This would imply that it is a product in and of itself that happens to use ICU. Instead it is more like a sample program to be used to implement ICU, not a product in and of itself. It provides

Re: UTF-8

2001-06-12 Thread Misha Wolf
On 12/06/2001 06:43:10 Bill Kurmey wrote: [...] Is the following an accurate statement of the present situation? Currently, if an email client receives a message with Content Type: containing charset=UTF-8 and accepts up to 6 octets for each scalar value, it would be considered Unicode

Re: UTF-16 problems

2001-06-12 Thread Jianping Yang
Lisa Moore wrote: Jianping wrote: only Oracle provides fully UTF-8 and UTF-16 support for RDBMS Whoa...let me interject, DB2 for OS/390 supports UTF-8 and UTF-16. And DB2 for Intel, Unix, supported both much earlier. I cannot speak to Jiangping's intrepretation of fully The fully

Re: 19th Unicode Conference, September 2001, San Jose, CA, USA

2001-06-12 Thread Misha Wolf
On 12/06/2001 04:16:50 Peter Constable wrote: [...] I agree. I scheduled a week-long engagement mid-Sept. expecting from the past few years IUC to be held the first week of Sept. This has resulted in a conflict requiring me to adjust travel plans. It also wouldn't hurt to advertise future

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Peter_Constable
On 06/12/2001 01:13:48 PM Jianping Yang wrote: If you convert ED A0 80 ED B0 80 into UTF-16, what does it mean then? I think definitely it means U-0001. I'd say not if that 6-byte sequence is interpreted in terms of *UTF-8*. UTF-8 has no 6-byte sequences. It must be something else, like

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Jianping Yang
[EMAIL PROTECTED] wrote: On 06/11/2001 10:45:46 PM Mark Davis wrote: [earlier] - Oracle could probably make a case for their name for UTF8 simply being an anachronism. After all, the original definition of UTF-8 did convert surrogate pairs as they are doing in what they call UTF8.

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Jianping Yang
[EMAIL PROTECTED] wrote: On 06/12/2001 01:13:48 PM Jianping Yang wrote: If you convert ED A0 80 ED B0 80 into UTF-16, what does it mean then? I think definitely it means U-0001. I'd say not if that 6-byte sequence is interpreted in terms of *UTF-8*. So UTF-8 is not compatible

FSS-UTF, UTF-2, UTF-8, and UTF-16

2001-06-12 Thread Kenneth Whistler
Mark said: UTF-8 was defined before UTF-16. At the time it was first defined, there were no surrogates, so there was no special handling of the D800..DFFF code points. Technically, the first statement is not true. UTF-2 and FSS-UTF *were* defined well before UTF-16. FSS-UTF was defined on

Re: FW: Russian Unicode Convertion

2001-06-12 Thread Otto Stolz
On Monday, June 11, 2001 4:14 AM, Vadim Snurnikov wrote: How can I read a text in Unicode (Russian) where every Russian letter is represented like that: D=B6 (or similar)? Unfortunately, all these four characters that stand for one Russian letter are of one byte each, so that I am getting 4

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Peter_Constable
On 06/12/2001 01:13:48 PM Jianping Yang wrote: If you convert ED A0 80 ED B0 80 into UTF-16, what does it mean then? I think definitely it means U-0001. Please read the definitions and tell me how you support that. The only way I can see to support that is to assume that the mapping

And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Kenneth Whistler
Case I. Code points U-D800..U-DFFF excluded from the UTF's. The way God intended it to be code point UTF-8 UTF-16 UTF-32 a. = 00 b. D700 = ED 9F BF D7FF D7FF g. E000 = EE 80 80

U+007E and U+02DC

2001-06-12 Thread Patrick Andries
The Unicode Standard 3.0 (page 149) says that U+007E can be used as a Spacing Clone of Combining Tilde. But isn't it this the function of U+02DC (the so called SMALL TILDE) ? Why suggest this usage then and not point to U+02DC ? Could one say (as some typographers see it) that U+007E should

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Jianping Yang
One thing needs to clarify here is that there is no four byte encoding in UTF-8S proposal and four byte encoding is illegal but not irregular. As everything in UTF-8S is perfect match to UTF-16, any blame to this proposal also applies to UTF-16 encoding form. Regards, Jianping. Kenneth Whistler

U+2011 and U+2010

2001-06-12 Thread Patrick Andries
The Unicode Standard 3.0 (page 150) says that U+2011 NON-BREAKING HYPHEN is present for compatibility with existing standards as if it shouldn't really be encoded. But isn't its relation to U+2010, the same as the one that opposes SPACE to NO-BREAK SPACE, i.e. a semantic (behavioural) one ?

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Jianping Yang
Kenneth Whistler wrote: Jianping wrote: One thing needs to clarify here is that there is no four byte encoding in UTF-8S proposal and four byte encoding is illegal but not irregular. As everything in UTF-8S is perfect match to UTF-16, any blame to this proposal also applies to UTF-16

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Jianping Yang
Kenneth Whistler wrote: Jianping responded: Kenneth Whistler wrote: Jianping wrote: One thing needs to clarify here is that there is no four byte encoding in UTF-8S proposal and four byte encoding is illegal but not irregular. As everything in UTF-8S is perfect match to

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Kenneth Whistler
Jianping responded: Kenneth Whistler wrote: Jianping wrote: One thing needs to clarify here is that there is no four byte encoding in UTF-8S proposal and four byte encoding is illegal but not irregular. As everything in UTF-8S is perfect match to UTF-16, any blame to this

UTF-8s programming problems

2001-06-12 Thread Carl W. Brown
UTF-8s is reminiscent of a problem that I had installing a certain vendor's terminals. Each screen was about 2K of data. The terminal communications protocol broke the data into 128 byte chunks. Each block had a small header and the terminal would wait for a response before the next block was

Re: U+2011 and U+2010

2001-06-12 Thread Kenneth Whistler
Patrick Andries asked: The Unicode Standard 3.0 (page 150) says that U+2011 NON-BREAKING HYPHEN is present for compatibility with existing standards as if it shouldn't really be encoded. But isn't its relation to U+2010, the same as the one that opposes SPACE to NO-BREAK SPACE, i.e. a

Russian Unicode Convertion

2001-06-12 Thread Vladimir Ivanov
Vadim Snurnikov wrote: How can I read a text in Unicode (Russian) where every Russian letter is represented like that: D=B6 (or similar)? (The e-mail got transferred to this format.) What kind of software is used to get E-mail? I recommend Outlook Express 5.0 and above. It allows you to

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Kenneth Whistler
Jianping said: What you finally stated today is that F0 90 80 80 is flat-out *illegal* in UTF-8s. That was a missing piece of the puzzle for anyone trying to interpret what you are proposing. In the UTF-8S, there should be no irregular forms, should we repeat the history again?

UTF-8S: a modest proposal

2001-06-12 Thread John Cowan
I would urge Oracle and friends to move this to a different venue, specifically [EMAIL PROTECTED] As far as I can see, UTF-8S does not need either the approval or the disapproval of the Unicode Consortium. If it is actually in use, it needs a label -- and IANA is in the business of assigning

Re: U+2011 and U+2010

2001-06-12 Thread Peter_Constable
In fact, in this particular case, if I recall, the distinctions were probably considered to be good practice, and not something to be mapped away. XCCS was often a *model* for early Unicode, rather than a character encoding that forced the grudging inclusion of many icky characters that we would

Re: U+2011 and U+2010

2001-06-12 Thread Michael \(michka\) Kaplan
From: [EMAIL PROTECTED] Out of curiousity, is there documentation on XCCS available anywhere? Check out google.com: it will get about 120+ hits on the words XCCS standard and several of them seem vaguely relevant. :-) MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/