Well... there was only one Unicode in those days. But the vagueness
persisted after its time. This is fine in the consumer documentation,
where it really doesn't matter. But in the development docs it is a real
problem.
Of course, I understand that software development cycles, the size of the
Actually, the problem is the *same old thing*: no education about I18N
issues in general. There are all sorts of interesting "biases" about
Unicode related to the still lamentable level of I18N training that the
average developer receives.
It's simply shocking.
Best regards,
Addison
On Wed,
I apologize.
Jony
-Original Message-
From: Becker [mailto:Becker]
Sent: Friday, July 21, 2000 10:34 PM
To: Unicode List
Cc: Myself
Subject: RE: Unicode in VFAT file system
Jony Rosenne, who has been a great contributor since or before the
beginning, wrote in an off moment
Message-
From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]]
Sent: Friday, July 21, 2000 9:44 AM
To: Unicode List
Subject: Re: Unicode in VFAT file system
Although there is some truth here the fact is that it is not
really true
today that everyone equates the two. The default thought
In the meantime, Microsoft is still pretty firmly rooted in the idea that
Unicode=USC-2 (or UTF-16le on Windows 2000).
I don't think we can make a blanket statement about MS being firmly rooted
in USC-2. They're very big and manage lots of code that a lot of people use
on a regular basis, and
On 07/21/2000 04:42:05 AM [EMAIL PROTECTED] wrote:
Unicode is the code, which is based on 16 bit chunks of ether or whatever,
and
UTF-8 is a biased transformation format...
That's too simple to capture the current reality, as others have been
indicating. The full story is availble in UTR17,
Asmus Freytag wrote:
At 09:53 AM 7/20/00 -0800, Ken Krugler wrote:
2. Is little-endian UCS-2 a valid encoding that I just don't
know about?
Yes, it is. Your example of the VFAT system is a near perfect
case, since
the details of it form what Unicode calls a 'Higher level
protocol' and
Unicode has changed and evolved over the years. At this point, UCS-2 is a funny
beast, because it shares precisely the same encoding space as UTF-16. That is,
in code units there is absolutely no difference between them. The only real
difference is whether you interpret the code units in the
As a serialization, UTF-16 has three forms: UTF-16, UTF-16BE, and
UTF-16LE. The
first is with (optionally) a BOM, and the others without.
I know this is what the Standard dictates, and I think I understand why,
but it doesn't make complete sense to the novice trying to find his/her
way:
novice
[EMAIL PROTECTED] wrote:
Why does it say there are three varieties when a 16-bit datum can only be
serialised in two orders?
The simplest way to think about it is to remember that a MIME charset is meant
to provide *minimal* information for the receiver to convert bytes into
characters. If
At 04:58 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote:
If UCS-2LE is a *standard* encoding (and it is in fact mentioned in UTR-17),
how does VFAT directories qualify as a "higher level protocol"?
My understanding of "higher level protocol" is that it is a *non* standard
usage of some kind, allowed
At 07:14 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote:
Why does it say there are three varieties when a 16-bit datum can only be
serialised in two orders? If the scheme UTF-16 doesn't have a BOM, isn't it
just one of the other two? When it does have a BOM, it can still be
serialised in two ways, so
Jony Rosenne, who has been a great contributor since or before the
beginning, wrote in an off moment:
UTF-8 is a biased transformation format designed to save American and
Western Europeans storage space and to give some people a warm feeling by
keeping Unicode in the familiar 8 bit world.
On 07/21/2000 12:55:59 PM [EMAIL PROTECTED] wrote:
The problem is that the labels where invented to tag data streams, not to
'label' the result of autodetection. As you point out there are 4 results
of
auto-detection:
UTF-16, no BOM
UTF-16, no BOM, but arriving in reverse byte order (for my
Recently I've had the dubious pleasure of delving into the details of
the VFAT file system. For long file names, I thought it used UCS-2,
but in looking at the data with a disk editor, it appears to be
byte-swapping (little endian). I thought that UCS-2 was by definition
big endian, thus
At 09:53 AM 7/20/00 -0800, Ken Krugler wrote:
2. Is little-endian UCS-2 a valid encoding that I just don't know about?
Yes, it is. Your example of the VFAT system is a near perfect case, since
the details of it form what Unicode calls a 'Higher level protocol' and
those may legitimately override
Hi Addison,
UCS-2 is pretty close to the same thing as UTF-16. The differences do not
apply here.
UCS-2 can be big-endian or little-endian. The rule is that BE is the
default. However, on Intel platforms, you shouldn't be surprised to see LE
everywhere: that's the architecture. Microsoft is
At 11:34 AM 7/20/00 -0800, John Cowan wrote:
1. Could it be using UTF-16LE? I tried creating an entry with a
surrogate pair, but the name was displayed with two black boxes on a
Windows 2000-based computer, so I assumed that surrogates were not
supported.
Probably not. So technically it
At 11:41 AM 7/20/00 -0800, Ken Krugler wrote:
No. UCS-2 and UCS-4 have always been bigendian. Read ISO 10646-1:1993,
section "6.3 Octet order" (page 7):
When serialized as octets, a more significant octet shall
precede less significant octets.
The section continues: "When not serialized
Well...
There has always been a BOM in Unicode and it's there for a reason: to
indicate the byte order on different processors. There is an inherent BE
bias in Unicode. But this doesn't invalidate an LE view of the Universe.
Avoiding for the moment the word-parsing that Markus suggests, Unicode
Addison Phillips [EMAIL PROTECTED] wrote:
Avoiding for the moment the word-parsing that Markus suggests, Unicode
on Microsoft platforms has always been LE (at least on Intel) and they
have called the encoding they use "UCS-2" (when they bothered with
such things: in the past they always
21 matches
Mail list logo