When folks tell me not to worry as in my previous query, then I really do
get worried and try not to be hurried. (last clause not of semantic intent
but scans well.) 8-)
So a few points of clarification please.
Will the Unicode version of UTF-8 be registered with IANA and, if so, what
will
In a message dated 2001-06-11 21:46:38 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
Shouldn't a war about UTF-8 be discussed on Unicore?
Please, don't excommunicate us non-members from the discussion by restricting
it to the members-only unicoRe list. We have something to contribute
At Mon, 11 Jun 2001 15:43:42 -0700,
Carl W. Brown [EMAIL PROTECTED] wrote:
I first I thought the same thing but I have changed my mind. There are
problems but the problems are with UTF-16 not UTF-8.
I don't think your new UTF-16 propesal solves any problem. It's yet
another encoding. It won't
On 06/11/2001 10:45:46 PM Mark Davis wrote:
[earlier]
- Oracle could probably make a case for their name for UTF8 simply being
an
anachronism. After all, the original definition of UTF-8 did convert
surrogate pairs as they are doing in what they call UTF8.
[now]
UTF-8 was defined before
In other words, Oracle has an alternate solution here for 9i -- they can
simply explain that the old product defined the old pre-surrogate UTF-8
and
the new product is now surrogate aware and uses the current definition.
There's a mistake being made here that has been made repeatedly throughout
Will the Unicode version of UTF-8 be registered with IANA and, if so, what
will be its charset designation?
Firstly, please let's halt the confusion right here. I'll repeat myself
once more. We're not redefining UTF-8. What we're proposing is not a
Unicode version of UTF-8. It is another,
Marco Cimarosti [EMAIL PROTECTED] wrote:
My assumption was that, in the first case (no sort order requested by the
client), a server could in theory provide a result set randomly shuffled.
Of
course, I know that this won't normally happen but, however, the server is
allowed to provide whatever
At 18:43 -0700 2001-06-11, Rick McGowan wrote:
Everson wrote:
Lots of people with names like McGowan like to have the c,
ostensibly an abbreviation for ac superscripted and underlined. ;-)
(Sound of wretching...)
You mean Ack!?
Uh, no. I like it just fine as-is. If I
actually spelled my
Antoine Leca shcrissi (Sicilian, this time):
Marco Cimarosti écrivit (!):
That is true. It is as true as the fact that when we French
are to write the oe digraph, we *type* it as two separate
letters, for lack of better solutions.
The two issues are quite different.
- The lack of French oe
Peter Constable wrote:
The point is that encodings currently used for French have
none of these.
Well, then, just do what the French do: don't use any of
them, even though you may be tempted to use some.
[...]
The ideal for me, rather than adding the missing e and
i, would be to
Tuesday, June 12, 2001
Did the Lion dip his thorn in ink?
Jim Agenbroad (discalimer and addresses at bottom)
On Mon, 11 Jun 2001, John Hudson wrote:
At 15:56 6/11/2001 +0100, Michael Everson wrote:
Shaw, Bernard. 1962. Androcles the
- Original Message -
From: System Administrator [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, June 11, 2001 21:41
Subject: Undeliverable: Re: UTF8 vs AL32UTF8
Your message
To: Misha Wolf
Cc: [EMAIL PROTECTED]
Subject: Re: UTF8 vs AL32UTF8
Sent:Tue,
We will respond more fully later, but I want to make it very clear that
despite the unfortunate and confusing choice of name, xICU is not
connected to the ICU product or team in any way.
Mark
- Original Message -
From: Bill Kurmey [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday,
On 06/12/2001 10:29:26 AM Mark Davis wrote:
When applying UTF-8 -- as originally designed -- the sequence D800
DC00 would transform into a 6-byte sequence. Transforming back would
result with the original sequence D800 DC00. When applying this to
Unicode (16 bit only, at the
What do you plan to propose for phonetic modifier letters supa,
supo and supi:
1) Will you propose three new code points?
2) will you propose to unify them with U+00AA, U00BA and U+2071?
If I were to propose new code points, the only differences might be between
Ll and Lm, and that 00AA and
In a message dated 2001-06-12 1:07:17 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
There's a mistake being made here that has been made repeatedly throughout
our discussion: that's to assume that there are two kinds of UTF-8: the
original, in which the code unit sequence ED A0 80 ED
Bill Kurmey wrote:
Will the Unicode version of UTF-8 be registered with IANA and, if so, what
will be its charset designation?
I believe this question is based on a misunderstanding:
6-byte sequences have been mentioned in this discussion. The intended meaning was
pairs of 3-byte sequences
Toby,
I agree that there is a need to preserve standards. Oracle did not support
surrogates. If you passed it a UTF-16 data stream it would not be converted
into proper UTF-8 encoding. At this juncture it should have fixed UTF8.
This would have worked with the old data because it had no
That would be viewing history in the prism of present thought.
When applying UTF-8 -- as originally designed -- the sequence D800
DC00 would transform into a 6-byte sequence. Transforming back would
result with the original sequence D800 DC00. When applying this to
Unicode (16
Bill,
This product is not developed by the ICU development team. We at X.Net are
making this code available for people who are interested in implementing
ICU. This is designed to simplify ICU implementation if you choose to use
it. Even if you use it you can always invoke ICU directly.
I
Mark,
I though hard about the name. I thought that maybe it should be totally
different. This would imply that it is a product in and of itself that
happens to use ICU. Instead it is more like a sample program to be used to
implement ICU, not a product in and of itself.
It provides
On 12/06/2001 06:43:10 Bill Kurmey wrote:
[...]
Is the following an accurate statement of the present situation?
Currently, if an email client receives a message with Content Type:
containing charset=UTF-8 and accepts up to 6 octets for each scalar
value, it would be considered Unicode
Lisa Moore wrote:
Jianping wrote:
only Oracle provides fully UTF-8 and
UTF-16 support for RDBMS
Whoa...let me interject, DB2 for OS/390 supports UTF-8 and UTF-16. And DB2
for Intel, Unix, supported both much earlier. I cannot speak to Jiangping's
intrepretation of fully
The fully
On 12/06/2001 04:16:50 Peter Constable wrote:
[...]
I agree. I scheduled a week-long engagement mid-Sept. expecting from the
past few years IUC to be held the first week of Sept. This has resulted in
a conflict requiring me to adjust travel plans. It also wouldn't hurt to
advertise future
On 06/12/2001 01:13:48 PM Jianping Yang wrote:
If you convert ED A0 80 ED B0 80 into UTF-16, what does it mean then?
I
think definitely it means U-0001.
I'd say not if that 6-byte sequence is interpreted in terms of *UTF-8*.
UTF-8 has no 6-byte sequences. It must be something else, like
[EMAIL PROTECTED] wrote:
On 06/11/2001 10:45:46 PM Mark Davis wrote:
[earlier]
- Oracle could probably make a case for their name for UTF8 simply being
an
anachronism. After all, the original definition of UTF-8 did convert
surrogate pairs as they are doing in what they call UTF8.
[EMAIL PROTECTED] wrote:
On 06/12/2001 01:13:48 PM Jianping Yang wrote:
If you convert ED A0 80 ED B0 80 into UTF-16, what does it mean then?
I
think definitely it means U-0001.
I'd say not if that 6-byte sequence is interpreted in terms of *UTF-8*.
So UTF-8 is not compatible
Mark said:
UTF-8 was defined before UTF-16. At the time it was first defined, there
were no surrogates, so there was no special handling of the D800..DFFF code
points.
Technically, the first statement is not true.
UTF-2 and FSS-UTF *were* defined well before UTF-16. FSS-UTF was
defined on
On Monday, June 11, 2001 4:14 AM, Vadim Snurnikov wrote:
How can I read a text in Unicode (Russian) where every Russian letter
is represented like that: D=B6 (or similar)? Unfortunately, all these
four characters that stand for one Russian letter are of one byte each,
so that I am getting 4
On 06/12/2001 01:13:48 PM Jianping Yang wrote:
If you convert ED A0 80 ED B0 80 into UTF-16, what does it mean then?
I
think definitely it means U-0001.
Please read the definitions and tell me how you support that.
The only way I can see to support that is to assume that the mapping
Case I. Code points U-D800..U-DFFF excluded
from the UTF's. The way God intended it to be
code point UTF-8 UTF-16 UTF-32
a. = 00
b. D700 = ED 9F BF D7FF D7FF
g. E000 = EE 80 80
The Unicode Standard 3.0 (page 149) says that U+007E can be used as a
Spacing Clone of Combining Tilde. But isn't it this the function of U+02DC
(the so called SMALL TILDE) ? Why suggest this usage then and not point to
U+02DC ?
Could one say (as some typographers see it) that U+007E should
One thing needs to clarify here is that there is no four byte encoding in
UTF-8S proposal and four byte encoding is illegal but not irregular. As
everything in UTF-8S is perfect match to UTF-16, any blame to this proposal
also applies to UTF-16 encoding form.
Regards,
Jianping.
Kenneth Whistler
The Unicode Standard 3.0 (page 150) says that U+2011 NON-BREAKING HYPHEN is
present for compatibility with existing standards as if it shouldn't really
be encoded. But isn't its relation to U+2010, the same as the one that
opposes SPACE to NO-BREAK SPACE, i.e. a semantic (behavioural) one ?
Kenneth Whistler wrote:
Jianping wrote:
One thing needs to clarify here is that there is no four byte encoding in
UTF-8S proposal and four byte encoding is illegal but not irregular. As
everything in UTF-8S is perfect match to UTF-16, any blame to this proposal
also applies to UTF-16
Kenneth Whistler wrote:
Jianping responded:
Kenneth Whistler wrote:
Jianping wrote:
One thing needs to clarify here is that there is no four byte encoding in
UTF-8S proposal and four byte encoding is illegal but not irregular. As
everything in UTF-8S is perfect match to
Jianping responded:
Kenneth Whistler wrote:
Jianping wrote:
One thing needs to clarify here is that there is no four byte encoding in
UTF-8S proposal and four byte encoding is illegal but not irregular. As
everything in UTF-8S is perfect match to UTF-16, any blame to this
UTF-8s is reminiscent of a problem that I had installing a certain vendor's
terminals. Each screen was about 2K of data. The terminal communications
protocol broke the data into 128 byte chunks. Each block had a small header
and the terminal would wait for a response before the next block was
Patrick Andries asked:
The Unicode Standard 3.0 (page 150) says that U+2011 NON-BREAKING HYPHEN is
present for compatibility with existing standards as if it shouldn't really
be encoded. But isn't its relation to U+2010, the same as the one that
opposes SPACE to NO-BREAK SPACE, i.e. a
Vadim Snurnikov wrote:
How
can I read a text in Unicode (Russian) where every Russian letter is
represented like that: D=B6 (or similar)? (The e-mail got
transferred to this format.)
What kind of software is used to get E-mail?
I recommend Outlook Express 5.0 and above. It allows you to
Jianping said:
What you finally stated today is that F0 90 80 80 is flat-out
*illegal* in UTF-8s. That was a missing piece of the puzzle for anyone
trying to interpret what you are proposing.
In the UTF-8S, there should be no irregular forms, should we repeat the history
again?
I would urge Oracle and friends to move this to a different venue, specifically
[EMAIL PROTECTED] As far as I can see, UTF-8S does not need either
the approval or the disapproval of the Unicode Consortium. If it is
actually in use, it needs a label -- and IANA is in the business of assigning
In fact, in this particular case, if I recall, the distinctions were
probably considered to be good practice, and not something to be mapped
away. XCCS was often a *model* for early Unicode, rather than a character
encoding that forced the grudging inclusion of many icky characters
that we would
From: [EMAIL PROTECTED]
Out of curiousity, is there documentation on XCCS available anywhere?
Check out google.com: it will get about 120+ hits on the words XCCS
standard and several of them seem vaguely relevant. :-)
MichKa
Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/
44 matches
Mail list logo