RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-06-07 Thread Christopher John Fynn
Simon Law wrote: << In Oracle9i our next Database Release shipping this summer, we have introduced support for two new Unicode character sets. ...>> New character *sets* ???

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-31 Thread Ayers, Mike
> From: Carl W. Brown [mailto:[EMAIL PROTECTED]] > > I resisted calling it FTF-8 (Funky Transfer Format - 8), but > if you want to > call it Weird Transfer Format - 8, I don't have any real objections. Well, that's ONE possible translation of "WTF"... /|/|ike

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-31 Thread Carl W. Brown
wn'; Simon Law; [EMAIL PROTECTED] Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) If you have this funny encoding please don't call it UTF8 because it is not UTF8 and will only confuse users. You could call it OTF8 or something like that but not UTF8.

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-31 Thread Ayers, Mike
If you have this funny encoding please don't call it UTF8 because it is not UTF8 and will only confuse users. You could call it OTF8 or something like that but not UTF8. How about "WTF-8"? Sorry - I couldn't resist. /|/|ike

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-31 Thread Carl W. Brown
ut not UTF8.   Carl -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Simon LawSent: Wednesday, May 30, 2001 11:02 AMTo: [EMAIL PROTECTED]Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)Hi Folks, Over the last f

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-31 Thread Vaintroub, Wladislav
acters from all scripts are represented in 2 bytes.   Comments?       -Original Message-From: Simon Law [mailto:[EMAIL PROTECTED]]Sent: Wednesday, May 30, 2001 8:02 PMTo: [EMAIL PROTECTED]Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)Hi Folks, O

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-30 Thread Michael \(michka\) Kaplan
someone emits the b michka - Original Message - From: "Simon Law" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, May 30, 2001 11:01 AM Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) > Hi Folks, > > Over the last

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-30 Thread Michael \(michka\) Kaplan
Simon, Would you care to answer (officially) why exactly Oracle needs for anything to be done here? Per the spec, it is not illegal for a process to interpret 5/6-byte supplementary characters; it is only illegal to emit them. It seems that Oracle and everyone else is well covered with the existi

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-30 Thread Carl W. Brown
8 and UTF-32 system that sort like UTF-16 is folly.   Carl -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Simon LawSent: Wednesday, May 30, 2001 11:02 AMTo: [EMAIL PROTECTED]Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in we

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-30 Thread Ayers, Mike
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > According to the proposal, UTF-8S and UTF-32S would not have the same > status: they wouldn't be for interchange; they'd just be for > representation > internal to a given system, like UTF-EBCDIC (which, I think I > heard, has > not actual

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-30 Thread Simon Law
Tuesday, May 29, 2001 3:47 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) Carl, > Ken, > > UTF-8s is essentially a way to ignore surrogate processing.  It allows a > company to encode UTF-16

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread
9=P?M(B: "Carl W. Brown" <[EMAIL PROTECTED]>; $B08@h(B: [EMAIL PROTECTED]; Cc: $BF|;~(B: 01/05/30 0:46 $B7oL>(B: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) >Ken, > >I suspect that Oracle is specifically pushing for this standard because

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Carl W. Brown
ay, May 29, 2001 3:47 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) Carl, > Ken, > > UTF-8s is essentially a way to ignore surrogate processing. It allows a > company to encode UTF-16 wit

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Kenneth Whistler
Carl, > Ken, > > UTF-8s is essentially a way to ignore surrogate processing. It allows a > company to encode UTF-16 with UCS-2 logic. > > The problem is that by not implementing surrogate support you can introduce > subtle errors. For example it is common to break buffers apart into > segment

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Carl W. Brown
. Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Kenneth Whistler Sent: Tuesday, May 29, 2001 11:18 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) Doug

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Jianping Yang
Antoine Leca wrote: > Jianping Yang wrote: > > > > As a matter of fact, the surrogate or supplementary character was not defined > > in the past, > > How long is "the past"? I remember reading about these surrogates the first > time I put my hands on a draft copy of ISO 10646. It was nearly six

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Kenneth Whistler
Doug wrote: > UTF-8 and UTF-32 should absolutely not be similarly hacked to maintain some > sort of bizarre "compatibility" with the binary sorting order of UTF-16. > UTC should not, and almost certainly will not, endorse such a proposal on the > part of the database vendors. I would be l

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Peter_Constable
On 05/27/2001 08:03:37 PM Jianping Yang wrote: >>But it seems to me that we've lived without >>Premise B in the past, and that it won't benefit us to adopt it now. Why >>bother with it? Why not continue doing what we already know how to do? >As a matter of fact, the surrogate or supplementary c

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-28 Thread Michael \(michka\) Kaplan
From: "Jianping Yang" <[EMAIL PROTECTED]> > As a matter of fact, the surrogate or supplementary > character was not defined in the past, so we could > live without Premise B in the past. But now the > supplementary character is defined and will soon be > supported, we have to bother with it. Poo

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-28 Thread Carl W. Brown
- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of [EMAIL PROTECTED] Sent: Monday, May 28, 2001 3:30 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) In a message dated 2001-05-26 16:00:47 Pacific Day

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-28 Thread DougEwell2
In a message dated 2001-05-26 16:00:47 Pacific Daylight Time, [EMAIL PROTECTED] writes: > The issue is this: Unicode's three encoding forms don't sort in the same > way when sorting is done using that most basic and > valid-in-almost-no-locales-but-easy-and-quick approach of simply comparing

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-28 Thread Antoine Leca
Jianping Yang wrote: > > As a matter of fact, the surrogate or supplementary character was not defined > in the past, How long is "the past"? I remember reading about these surrogates the first time I put my hands on a draft copy of ISO 10646. It was nearly six years ago. Or do you mean that it

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-28 Thread
$B!z$8$e$&$$$C$A$c$s!z(B EKYWY TXLY NPZ P MPVD XPHYV LPWWQY NKT ZPN XT WYPZTX PE PMM ET HPWWD "EYX EKTSZPXV'Z HTWY GSX P XSHOYW EKPX TXY PXV LTHHQEHYXE, ET HY, QZ RSQEY ZLPWD" >> >> >There was another abomination proposed. Oracle rather than adding UTF-16 >> >support proposed that non plan

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-27 Thread Jianping Yang
I don't want to argue on this lengthy email, but only point two facts: >According to the proposal, UTF-8S and UTF-32S would not have the same >status: they wouldn't be for interchange; they'd just be for representation >internal to a given system, like UTF-EBCDIC (which, I think I heard, has >not

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-26 Thread Peter_Constable
>If you think something abominable is happening, please raise a loud voice >and flood UTC members with e-mail and tell everyone what you think and why >you think it. Nobody can hear you when you mumble. > >And it helps if you have solid technical and philosophical arguments to convey. Well, I w

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-25 Thread Rick McGowan
Some people said things like... >There was another abomination proposed. >I was choosing not to mention the abominable. The abominable steam-rollers of history squish those who don't scream and run; and the few weak survivors are forever cleaning up the resulting messes. If you think someth

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-25 Thread Carl W. Brown
Unicode UTF-8 (was RE: UTF-8 signature in web and email) On 05/25/2001 12:21:13 PM Carl W. Brown wrote: >Peter, > >There was another abomination proposed. I was choosing not to mention the abominable. - Peter

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-25 Thread Michael \(michka\) Kaplan
ode UTF-8 (was RE: UTF-8 signature in web and email) > > On 05/25/2001 12:21:13 PM Carl W. Brown wrote: > > >Peter, > > > >There was another abomination proposed. > > I was choosing not to mention the abominable. > > > > - Peter > > >

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-25 Thread Carl W. Brown
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of [EMAIL PROTECTED] Sent: Friday, May 25, 2001 8:29 AM To: [EMAIL PROTECTED] Subject: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) On 05/25/2001 02:13:36 AM Bill Kurmey wrote: >Are there no

ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-25 Thread Peter_Constable
On 05/25/2001 02:13:36 AM Bill Kurmey wrote: >Are there not 2 versions of UTF-8, the Unicode Standard (maximum of 4 >octets) and the ISO/IEC Annex/Amendment to 10646 (maximum of 6 octets)? The distinction between the Unicode and ISO versions of UTF-8 is pretty irrelevant. ISO UTF-8 allows a max

Re: UTF-8 signature in web and email

2001-05-25 Thread John Cowan
Bill Kurmey scripsit: > Are there not 2 versions of UTF-8, the Unicode Standard (maximum of 4 > octets) and the ISO/IEC Annex/Amendment to 10646 (maximum of 6 octets)? Theoretically yes. > Is Unicode UTF-8 diverging from ISO by the way in which a scalar value is > encoded in UTF-8? Should folk

RE: UTF-8 signature in web and email

2001-05-25 Thread Bill Kurmey
Are there not 2 versions of UTF-8, the Unicode Standard (maximum of 4 octets) and the ISO/IEC Annex/Amendment to 10646 (maximum of 6 octets)? Is Unicode UTF-8 diverging from ISO by the way in which a scalar value is encoded in UTF-8? Should folks be concerned that the IETF RFC-2279 and RFC-2781

RE: UTF-8 signature in web and email

2001-05-24 Thread
$B!z$8$e$&$$$C$A$c$s!z(B >>Encoding-aware program that "understand" Unicode, should treat U+FEFF >>according to its literal meaning: "a non-breaking space having zero width". I take it that U+FEFF is the Cheshire Cat's favorite character. What about that CLOSED OPEN E, also? I got quite a l

RE: UTF-8 signature in web and email

2001-05-24 Thread Marco Cimarosti
David Starner wrote: > > > of now, UTF-8 is just one of many charsets in use on Unix. > >In fact! So why do Unixers worry about bytes <0xEF, 0xBB, > 0xBF> [...] > Because if 0xA0 or 0xA1 0xA1 (or 0x20) show at the start of a script, > it's wrong. [...] OK. I had written a reply to all your point

RE: UTF-8 signature in web and email

2001-05-23 Thread David Starner
At 11:35 AM 05/23/2001 +0200, Marco Cimarosti wrote: >David Starner wrote: > > You're asking for every program to treat UTF-8 specially. > >No I am not! I have been saying the exact opposite! [...] > > [...] > > of now, UTF-8 is just one of many charsets in use on Unix. > >In fact! So why do Uni

RE: UTF-8 signature in web and email

2001-05-23 Thread Marco Cimarosti
John Cowan wrote: > Well, "C-like language" is a hedge. IIRC, C99 thinks > everything above U+007F is a letter. OK, it was a hedge. I just wanted a scenario of plain text usage familiar to programmers, and where visualization was not the main thing. You can chose another example of your choice

RE: UTF-8 signature in web and email

2001-05-23 Thread Marco Cimarosti
John Cowan wrote: > Well, "C-like language" is a hedge. IIRC, C99 thinks > everything above U+007F is a letter. OK, it was a hedge. I just wanted a scenario of plain text usage familiar to programmers, and where visualization was not the main thing. You can chose another example of your choice

RE: UTF-8 signature in web and email

2001-05-23 Thread Marco Cimarosti
David Starner wrote: > You're asking for every program to treat UTF-8 specially. No I am not! I have been saying the exact opposite! ZWNBSP in just one more multibyte character and UTF-8 is just one more multibyte encoding. Why should this case be so special? > [...] > of now, UTF-8 is just one

Re: UTF-8 signature in web and email

2001-05-22 Thread Martin Duerst
At 00:07 01/05/23 +0100, Juliusz Chroboczek wrote: >MS-DOS users, on the other hand, expect applications to have pro- >prietary formats, and are quite happy to go through convoluted con- >version procedures in order to access their data (to the extent to >which they are happy in the first place).

Re: UTF-8 signature in web and email

2001-05-22 Thread Roozbeh Pournader
On 23 May 2001, Juliusz Chroboczek wrote: > Heck, MS-DOS doesn't even have the concept of concatenating plain > files! I'm sorry I don't get you. There is the DOS command "COPY A+B C" for that, with "/A" and "/B" switches for ASCII and binary files, and I have used that for years. What do you

Re: UTF-8 signature in web and email

2001-05-22 Thread Juliusz Chroboczek
DS> This will probably just end up as another CRLF/LF issue, requiring DS> plain text crossing from one system to another be changed. Yep. There is a big cultural difference between the MS-DOS and Mac world and the Unix world. Unix users expect to be able to use the very same tools for text and

RE: UTF-8 signature in web and email

2001-05-22 Thread David Starner
At 11:14 AM 05/22/2001 +0200, you wrote: >But, also in this case, why should it be a problem to have ZWNBSP in >whatever position in a file? Why should *this* character be more a problem >that SPACE, or TAB, or CARRIAGE RETURN, or COMMA, or name it? Because SPACE, TAB, CARRIAGE RETURN, or COMMA d

RE: UTF-8 signature in web and email

2001-05-22 Thread Marco Cimarosti
John Cowan wrote: > > [...] U+FEFF: [...] > > it (also) is a "ZERO WIDTH NO-BREAK SPACE". > > Actually, this semantic seems to be going away soon, but > until it does... My only information about the UTC's decisions is what passes on this mailing list, so I trust you. But I know that character

Re: UTF-8 signature in web and email

2001-05-22 Thread John Cowan
Marco Cimarosti scripsit: > You forget one fundamental thing about U+FEFF: it is not (only) a "byte > order mark" or an "encoding signature": it (also) is a "ZERO WIDTH NO-BREAK > SPACE". Actually, this semantic seems to be going away soon, but until it does... > I.e., it has been designed to b

RE: UTF-8 signature in web and email

2001-05-22 Thread Marco Cimarosti
David Starner wrote: > [...] At the fundamental heart of a Unix system is > passing arbitrary byte streams in highly flexible > ways. If every file starts with a signature then > that makes that significantly more complex. [...] You forget one fundamental thing about U+FEFF: it is not (only) a "

Re: UTF-8 signature in web and email

2001-05-21 Thread DougEwell2
In a message dated 2001-05-18 13:25:06 Pacific Daylight Time, [EMAIL PROTECTED] writes: > Last year, as previously the year before, we discussed the > possibility of defining some standard Unicode plain text formats. The > discussions foundered on the differences between text files meant fo

Re: UTF-8 signature in web and email

2001-05-21 Thread David Starner
At 11:39 AM 05/21/2001 -0400, [EMAIL PROTECTED] wrote: >In the Windows world that I live in, we expect to update our compilers and >other tools every few years, for a variety of reasons (not all of which have >to do with marketing or planned obsolescence). This is both good and bad, >but in gener

Re: UTF-8 signature in web and email

2001-05-21 Thread DougEwell2
In a message dated 2001-05-18 0:50:13 Pacific Daylight Time, [EMAIL PROTECTED] writes: > People using this heuristic, who didn't really think it would > work that well after the talk, have confirmed later that it > actually works extremely well (and they were writing production > code, not j

Re: UTF-8 signature in web and email

2001-05-18 Thread Michael \(michka\) Kaplan
From: "Edward Cherlin" <[EMAIL PROTECTED]> "A text file with a BOM is, if not rich text, at least above the poverty line." (modified from Ed's prior msg -- this one is a keeper!) michka

Re: UTF-8 signature in web and email

2001-05-18 Thread Michael \(michka\) Kaplan
michka the only book on internationalization in VB at http://www.i18nWithVB.com/ - Original Message - From: "Edward Cherlin" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, May 18, 2001 1:08 PM Subject: Re: UTF-8 signature in web and email >

Re: UTF-8 signature in web and email

2001-05-18 Thread Edward Cherlin
At 10:58 PM -0400 5/17/01, [EMAIL PROTECTED] wrote: >The "UTF-8 signature" discussion appears every few months on this list, >usually as a religious debate between those who believe in it and those who >do not. Be forewarned, my religion may not match yours. :-) My religion suggests that we fin

Re: UTF-8 signature in web and email

2001-05-18 Thread Martin Duerst
At 22:58 01/05/17 -0400, [EMAIL PROTECTED] wrote: >Martin D$B—S(Bst wrote: > > > There is about 5% of a justification > > for having a 'signature' on a plain-text, standalone file (the reason > > being that it's somewhat easier to detect that the file is UTF-8 from the > > signature than to read

Re: UTF-8 signature in web and email

2001-05-17 Thread DougEwell2
The "UTF-8 signature" discussion appears every few months on this list, usually as a religious debate between those who believe in it and those who do not. Be forewarned, my religion may not match yours. :-) Keld Jørn Simonsen wrote: > For UTF-8 there is no need to have a BOM, as there is on

Re: UTF-8 signature in web and email

2001-05-16 Thread Bill Kurmey
Delurking for a moment for a few points of clarification please. What is the definition of 'signature'? Does 'signature' in this thread's context, include the XML 4-byte declarations (charset.html#h-5.2.1) without the BOM as defined in this section? Are you folks advocating that the BOM is

Re: UTF-8 signature in web and email

2001-05-16 Thread Mark Davis
t; <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Wednesday, May 16, 2001 00:57 Subject: Re: UTF-8 signature in web and email > For UTF-8 there is no need to have a BOM, as there is only one > way of serializing octets in UTF-8. There is no little

Re: UTF-8 signature in web and email

2001-05-16 Thread Michael \(michka\) Kaplan
l Message - From: "Martin Duerst" <[EMAIL PROTECTED]> To: "Roozbeh Pournader" <[EMAIL PROTECTED]>; "Unicode List" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, May 15, 2001 6:55 PM Subject: Re: UTF-8 signature in web and em

RE: UTF-8 signature in web and email

2001-05-16 Thread Marco Cimarosti
Keld Jørn Simonsen wrote: > For UTF-8 there is no need to have a BOM, as there is only one > way of serializing octets in UTF-8. There is no little-endian > or big-endian. A BOM is superfluous and will be ignored. Not so. In plain text, it is a useful signature to distinguish UTF-8 from other thi

Re: UTF-8 signature in web and email

2001-05-16 Thread Martin Duerst
Hello Roozbeh At 04:02 01/05/15 +0430, Roozbeh Pournader wrote: >Well, I received a UTF-8 email from Microsoft's Dr International today. It >was a "multipart/alternative", with both the "text/plain" and "text/html" >in UTF-8. Well, nothing interesting yet, but the interesting point was >that the

Re: UTF-8 signature in web and email

2001-05-15 Thread Misha Wolf
This mail, addressed to <[EMAIL PROTECTED]>, was, presumably, intended for <[EMAIL PROTECTED]>. Misha On 15/05/2001 00:32:24 Roozbeh Pournader wrote: > Well, I received a UTF-8 email from Microsoft's Dr International today. It > was a "multipart/alternative", with both the "text/plain" and "t

RE: UTF-8 signature in web and email

2001-05-15 Thread Roozbeh Pournader
On Tue, 15 May 2001, Richard, Francois M wrote: > UTF-8 is considered as a character encoding form as any other... > For UTF-16 only, the BOM is recommended. > See http://www.w3.org/TR/REC-html40/charset.html#h-5.2.1 So BOM for UTF-8 HTML is neither recommended nor discouraged? Does anyone agree

RE: UTF-8 signature in web and email

2001-05-15 Thread Richard, Francois M
UTF-8 is considered as a character encoding form as any other... For UTF-16 only, the BOM is recommended. See http://www.w3.org/TR/REC-html40/charset.html#h-5.2.1 For character encoding determination (See http://www.w3.org/TR/REC-html40/charset.html#h-5.2) , priorities are defined as follow (h