Simon Law wrote:
<< In Oracle9i our next Database Release shipping this summer, we have introduced
support for two new Unicode character sets. ...>>
New character *sets* ???
> From: Carl W. Brown [mailto:[EMAIL PROTECTED]]
>
> I resisted calling it FTF-8 (Funky Transfer Format - 8), but
> if you want to
> call it Weird Transfer Format - 8, I don't have any real objections.
Well, that's ONE possible translation of "WTF"...
/|/|ike
wn'; Simon Law; [EMAIL PROTECTED]
Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
email)
If you have this funny encoding please don't call it UTF8 because it is not
UTF8 and will only confuse users. You could call it OTF8 or something like
that but not UTF8.
If you have this funny encoding please don't call it UTF8 because it is not
UTF8 and will only confuse users. You could call it OTF8 or something like
that but not UTF8.
How about "WTF-8"?
Sorry - I couldn't resist.
/|/|ike
ut
not UTF8.
Carl
-Original Message-From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Simon
LawSent: Wednesday, May 30, 2001 11:02 AMTo:
[EMAIL PROTECTED]Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8
signature in web and email)Hi Folks,
Over the last f
acters from all scripts
are
represented in 2 bytes.
Comments?
-Original Message-From: Simon Law
[mailto:[EMAIL PROTECTED]]Sent: Wednesday, May 30, 2001 8:02
PMTo: [EMAIL PROTECTED]Subject: Re: ISO vs Unicode
UTF-8 (was RE: UTF-8 signature in web and email)Hi Folks,
O
someone emits the b
michka
- Original Message -
From: "Simon Law" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, May 30, 2001 11:01 AM
Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)
> Hi Folks,
>
> Over the last
Simon,
Would you care to answer (officially) why exactly Oracle needs for anything
to be done here? Per the spec, it is not illegal for a process to interpret
5/6-byte supplementary characters; it is only illegal to emit them. It seems
that Oracle and everyone else is well covered with the existi
8 and UTF-32 system that sort like
UTF-16 is folly.
Carl
-Original Message-From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Simon
LawSent: Wednesday, May 30, 2001 11:02 AMTo:
[EMAIL PROTECTED]Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8
signature in we
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> According to the proposal, UTF-8S and UTF-32S would not have the same
> status: they wouldn't be for interchange; they'd just be for
> representation
> internal to a given system, like UTF-EBCDIC (which, I think I
> heard, has
> not actual
Tuesday, May 29, 2001 3:47 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
email)
Carl,
> Ken,
>
> UTF-8s is essentially a way to ignore surrogate processing.
It allows a
> company to encode UTF-16
9=P?M(B: "Carl W. Brown" <[EMAIL PROTECTED]>;
$B08@h(B: [EMAIL PROTECTED];
Cc:
$BF|;~(B: 01/05/30 0:46
$B7oL>(B: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)
>Ken,
>
>I suspect that Oracle is specifically pushing for this standard because
ay, May 29, 2001 3:47 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
email)
Carl,
> Ken,
>
> UTF-8s is essentially a way to ignore surrogate processing. It allows a
> company to encode UTF-16 wit
Carl,
> Ken,
>
> UTF-8s is essentially a way to ignore surrogate processing. It allows a
> company to encode UTF-16 with UCS-2 logic.
>
> The problem is that by not implementing surrogate support you can introduce
> subtle errors. For example it is common to break buffers apart into
> segment
.
Carl
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Kenneth Whistler
Sent: Tuesday, May 29, 2001 11:18 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
email)
Doug
Antoine Leca wrote:
> Jianping Yang wrote:
> >
> > As a matter of fact, the surrogate or supplementary character was not defined
> > in the past,
>
> How long is "the past"? I remember reading about these surrogates the first
> time I put my hands on a draft copy of ISO 10646. It was nearly six
Doug wrote:
> UTF-8 and UTF-32 should absolutely not be similarly hacked to maintain some
> sort of bizarre "compatibility" with the binary sorting order of UTF-16.
> UTC should not, and almost certainly will not, endorse such a proposal on the
> part of the database vendors.
I would be l
On 05/27/2001 08:03:37 PM Jianping Yang wrote:
>>But it seems to me that we've lived without
>>Premise B in the past, and that it won't benefit us to adopt it now. Why
>>bother with it? Why not continue doing what we already know how to do?
>As a matter of fact, the surrogate or supplementary c
From: "Jianping Yang" <[EMAIL PROTECTED]>
> As a matter of fact, the surrogate or supplementary
> character was not defined in the past, so we could
> live without Premise B in the past. But now the
> supplementary character is defined and will soon be
> supported, we have to bother with it.
Poo
-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of [EMAIL PROTECTED]
Sent: Monday, May 28, 2001 3:30 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
email)
In a message dated 2001-05-26 16:00:47 Pacific Day
In a message dated 2001-05-26 16:00:47 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
> The issue is this: Unicode's three encoding forms don't sort in the same
> way when sorting is done using that most basic and
> valid-in-almost-no-locales-but-easy-and-quick approach of simply comparing
Jianping Yang wrote:
>
> As a matter of fact, the surrogate or supplementary character was not defined
> in the past,
How long is "the past"? I remember reading about these surrogates the first
time I put my hands on a draft copy of ISO 10646. It was nearly six years ago.
Or do you mean that it
$B!z$8$e$&$$$C$A$c$s!z(B
EKYWY TXLY NPZ P MPVD XPHYV LPWWQY
NKT ZPN XT WYPZTX PE PMM ET HPWWD
"EYX EKTSZPXV'Z HTWY GSX
P XSHOYW EKPX TXY
PXV LTHHQEHYXE, ET HY, QZ RSQEY ZLPWD"
>>
>> >There was another abomination proposed. Oracle rather than adding UTF-16
>> >support proposed that non plan
I don't want to argue on this lengthy email, but only point two facts:
>According to the proposal, UTF-8S and UTF-32S would not have the same
>status: they wouldn't be for interchange; they'd just be for representation
>internal to a given system, like UTF-EBCDIC (which, I think I heard, has
>not
>If you think something abominable is happening, please raise a loud voice
>and flood UTC members with e-mail and tell everyone what you think and why
>you think it. Nobody can hear you when you mumble.
>
>And it helps if you have solid technical and philosophical arguments to
convey.
Well, I w
Some people said things like...
>There was another abomination proposed.
>I was choosing not to mention the abominable.
The abominable steam-rollers of history squish those who don't scream and
run; and the few weak survivors are forever cleaning up the resulting
messes.
If you think someth
Unicode UTF-8 (was RE: UTF-8 signature in web and
email)
On 05/25/2001 12:21:13 PM Carl W. Brown wrote:
>Peter,
>
>There was another abomination proposed.
I was choosing not to mention the abominable.
- Peter
ode UTF-8 (was RE: UTF-8 signature in web and email)
>
> On 05/25/2001 12:21:13 PM Carl W. Brown wrote:
>
> >Peter,
> >
> >There was another abomination proposed.
>
> I was choosing not to mention the abominable.
>
>
>
> - Peter
>
>
>
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of [EMAIL PROTECTED]
Sent: Friday, May 25, 2001 8:29 AM
To: [EMAIL PROTECTED]
Subject: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)
On 05/25/2001 02:13:36 AM Bill Kurmey wrote:
>Are there no
On 05/25/2001 02:13:36 AM Bill Kurmey wrote:
>Are there not 2 versions of UTF-8, the Unicode Standard (maximum of 4
>octets) and the ISO/IEC Annex/Amendment to 10646 (maximum of 6 octets)?
The distinction between the Unicode and ISO versions of UTF-8 is pretty
irrelevant. ISO UTF-8 allows a max
Bill Kurmey scripsit:
> Are there not 2 versions of UTF-8, the Unicode Standard (maximum of 4
> octets) and the ISO/IEC Annex/Amendment to 10646 (maximum of 6 octets)?
Theoretically yes.
> Is Unicode UTF-8 diverging from ISO by the way in which a scalar value is
> encoded in UTF-8? Should folk
Are there not 2 versions of UTF-8, the Unicode Standard (maximum of 4
octets) and the ISO/IEC Annex/Amendment to 10646 (maximum of 6 octets)?
Is Unicode UTF-8 diverging from ISO by the way in which a scalar value is
encoded in UTF-8? Should folks be concerned that the IETF RFC-2279 and
RFC-2781
$B!z$8$e$&$$$C$A$c$s!z(B
>>Encoding-aware program that "understand" Unicode, should treat U+FEFF
>>according to its literal meaning: "a non-breaking space having zero width".
I take it that U+FEFF is the Cheshire Cat's favorite character. What about that CLOSED
OPEN E, also? I got quite a l
David Starner wrote:
> > > of now, UTF-8 is just one of many charsets in use on Unix.
> >In fact! So why do Unixers worry about bytes <0xEF, 0xBB,
> 0xBF> [...]
> Because if 0xA0 or 0xA1 0xA1 (or 0x20) show at the start of a script,
> it's wrong. [...]
OK. I had written a reply to all your point
At 11:35 AM 05/23/2001 +0200, Marco Cimarosti wrote:
>David Starner wrote:
> > You're asking for every program to treat UTF-8 specially.
>
>No I am not! I have been saying the exact opposite!
[...]
> > [...]
> > of now, UTF-8 is just one of many charsets in use on Unix.
>
>In fact! So why do Uni
John Cowan wrote:
> Well, "C-like language" is a hedge. IIRC, C99 thinks
> everything above U+007F is a letter.
OK, it was a hedge. I just wanted a scenario of plain text usage familiar to
programmers, and where visualization was not the main thing.
You can chose another example of your choice
John Cowan wrote:
> Well, "C-like language" is a hedge. IIRC, C99 thinks
> everything above U+007F is a letter.
OK, it was a hedge. I just wanted a scenario of plain text usage familiar to
programmers, and where visualization was not the main thing.
You can chose another example of your choice
David Starner wrote:
> You're asking for every program to treat UTF-8 specially.
No I am not! I have been saying the exact opposite!
ZWNBSP in just one more multibyte character and UTF-8 is just one more
multibyte encoding. Why should this case be so special?
> [...]
> of now, UTF-8 is just one
At 00:07 01/05/23 +0100, Juliusz Chroboczek wrote:
>MS-DOS users, on the other hand, expect applications to have pro-
>prietary formats, and are quite happy to go through convoluted con-
>version procedures in order to access their data (to the extent to
>which they are happy in the first place).
On 23 May 2001, Juliusz Chroboczek wrote:
> Heck, MS-DOS doesn't even have the concept of concatenating plain
> files!
I'm sorry I don't get you. There is the DOS command "COPY A+B C" for that,
with "/A" and "/B" switches for ASCII and binary files, and I have used
that for years. What do you
DS> This will probably just end up as another CRLF/LF issue, requiring
DS> plain text crossing from one system to another be changed.
Yep.
There is a big cultural difference between the MS-DOS and Mac world
and the Unix world. Unix users expect to be able to use the very same
tools for text and
At 11:14 AM 05/22/2001 +0200, you wrote:
>But, also in this case, why should it be a problem to have ZWNBSP in
>whatever position in a file? Why should *this* character be more a problem
>that SPACE, or TAB, or CARRIAGE RETURN, or COMMA, or name it?
Because SPACE, TAB, CARRIAGE RETURN, or COMMA d
John Cowan wrote:
> > [...] U+FEFF: [...]
> > it (also) is a "ZERO WIDTH NO-BREAK SPACE".
>
> Actually, this semantic seems to be going away soon, but
> until it does...
My only information about the UTC's decisions is what passes on this mailing
list, so I trust you.
But I know that character
Marco Cimarosti scripsit:
> You forget one fundamental thing about U+FEFF: it is not (only) a "byte
> order mark" or an "encoding signature": it (also) is a "ZERO WIDTH NO-BREAK
> SPACE".
Actually, this semantic seems to be going away soon, but until it does...
> I.e., it has been designed to b
David Starner wrote:
> [...] At the fundamental heart of a Unix system is
> passing arbitrary byte streams in highly flexible
> ways. If every file starts with a signature then
> that makes that significantly more complex. [...]
You forget one fundamental thing about U+FEFF: it is not (only) a "
In a message dated 2001-05-18 13:25:06 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
> Last year, as previously the year before, we discussed the
> possibility of defining some standard Unicode plain text formats. The
> discussions foundered on the differences between text files meant fo
At 11:39 AM 05/21/2001 -0400, [EMAIL PROTECTED] wrote:
>In the Windows world that I live in, we expect to update our compilers and
>other tools every few years, for a variety of reasons (not all of which have
>to do with marketing or planned obsolescence). This is both good and bad,
>but in gener
In a message dated 2001-05-18 0:50:13 Pacific Daylight Time, [EMAIL PROTECTED]
writes:
> People using this heuristic, who didn't really think it would
> work that well after the talk, have confirmed later that it
> actually works extremely well (and they were writing production
> code, not j
From: "Edward Cherlin" <[EMAIL PROTECTED]>
"A text file with a BOM is, if not rich text, at least above the poverty
line."
(modified from Ed's prior msg -- this one is a keeper!)
michka
michka
the only book on internationalization in VB at
http://www.i18nWithVB.com/
- Original Message -
From: "Edward Cherlin" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, May 18, 2001 1:08 PM
Subject: Re: UTF-8 signature in web and email
>
At 10:58 PM -0400 5/17/01, [EMAIL PROTECTED] wrote:
>The "UTF-8 signature" discussion appears every few months on this list,
>usually as a religious debate between those who believe in it and those who
>do not. Be forewarned, my religion may not match yours. :-)
My religion suggests that we fin
At 22:58 01/05/17 -0400, [EMAIL PROTECTED] wrote:
>Martin D$BS(Bst wrote:
>
> > There is about 5% of a justification
> > for having a 'signature' on a plain-text, standalone file (the reason
> > being that it's somewhat easier to detect that the file is UTF-8 from the
> > signature than to read
The "UTF-8 signature" discussion appears every few months on this list,
usually as a religious debate between those who believe in it and those who
do not. Be forewarned, my religion may not match yours. :-)
Keld Jørn Simonsen wrote:
> For UTF-8 there is no need to have a BOM, as there is on
Delurking for a moment for a few points of clarification please.
What is the definition of 'signature'? Does 'signature' in this thread's
context, include the XML 4-byte declarations (charset.html#h-5.2.1) without
the BOM as defined in this section?
Are you folks advocating that the BOM is
t; <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, May 16, 2001 00:57
Subject: Re: UTF-8 signature in web and email
> For UTF-8 there is no need to have a BOM, as there is only one
> way of serializing octets in UTF-8. There is no little
l Message -
From: "Martin Duerst" <[EMAIL PROTECTED]>
To: "Roozbeh Pournader" <[EMAIL PROTECTED]>; "Unicode List"
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Tuesday, May 15, 2001 6:55 PM
Subject: Re: UTF-8 signature in web and em
Keld Jørn Simonsen wrote:
> For UTF-8 there is no need to have a BOM, as there is only one
> way of serializing octets in UTF-8. There is no little-endian
> or big-endian. A BOM is superfluous and will be ignored.
Not so. In plain text, it is a useful signature to distinguish UTF-8 from
other thi
Hello Roozbeh
At 04:02 01/05/15 +0430, Roozbeh Pournader wrote:
>Well, I received a UTF-8 email from Microsoft's Dr International today. It
>was a "multipart/alternative", with both the "text/plain" and "text/html"
>in UTF-8. Well, nothing interesting yet, but the interesting point was
>that the
This mail, addressed to <[EMAIL PROTECTED]>, was, presumably, intended
for <[EMAIL PROTECTED]>.
Misha
On 15/05/2001 00:32:24 Roozbeh Pournader wrote:
> Well, I received a UTF-8 email from Microsoft's Dr International today. It
> was a "multipart/alternative", with both the "text/plain" and "t
On Tue, 15 May 2001, Richard, Francois M wrote:
> UTF-8 is considered as a character encoding form as any other...
> For UTF-16 only, the BOM is recommended.
> See http://www.w3.org/TR/REC-html40/charset.html#h-5.2.1
So BOM for UTF-8 HTML is neither recommended nor discouraged? Does anyone
agree
UTF-8 is considered as a character encoding form as any other...
For UTF-16 only, the BOM is recommended.
See http://www.w3.org/TR/REC-html40/charset.html#h-5.2.1
For character encoding determination (See
http://www.w3.org/TR/REC-html40/charset.html#h-5.2) , priorities are defined
as follow (h
61 matches
Mail list logo