Re: Unicode in VFAT file system

2000-07-26 Thread addison
Well... there was only one Unicode in those days. But the vagueness persisted after its time. This is fine in the consumer documentation, where it really doesn't matter. But in the development docs it is a real problem. Of course, I understand that software development cycles, the size of the

Re: Unicode in VFAT file system

2000-07-26 Thread addison
Actually, the problem is the *same old thing*: no education about I18N issues in general. There are all sorts of interesting "biases" about Unicode related to the still lamentable level of I18N training that the average developer receives. It's simply shocking. Best regards, Addison On Wed,

RE: Unicode in VFAT file system

2000-07-22 Thread Jonathan Rosenne
I apologize. Jony -Original Message- From: Becker [mailto:Becker] Sent: Friday, July 21, 2000 10:34 PM To: Unicode List Cc: Myself Subject: RE: Unicode in VFAT file system Jony Rosenne, who has been a great contributor since or before the beginning, wrote in an off moment

RE: Unicode in VFAT file system

2000-07-21 Thread Jonathan Rosenne
Message- From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Friday, July 21, 2000 9:44 AM To: Unicode List Subject: Re: Unicode in VFAT file system Although there is some truth here the fact is that it is not really true today that everyone equates the two. The default thought

Re: Unicode in VFAT file system

2000-07-21 Thread Peter_Constable
In the meantime, Microsoft is still pretty firmly rooted in the idea that Unicode=USC-2 (or UTF-16le on Windows 2000). I don't think we can make a blanket statement about MS being firmly rooted in USC-2. They're very big and manage lots of code that a lot of people use on a regular basis, and

Re: Unicode in VFAT file system

2000-07-21 Thread Peter_Constable
On 07/21/2000 04:42:05 AM [EMAIL PROTECTED] wrote: Unicode is the code, which is based on 16 bit chunks of ether or whatever, and UTF-8 is a biased transformation format... That's too simple to capture the current reality, as others have been indicating. The full story is availble in UTR17,

RE: Unicode in VFAT file system

2000-07-21 Thread Marco . Cimarosti
Asmus Freytag wrote: At 09:53 AM 7/20/00 -0800, Ken Krugler wrote: 2. Is little-endian UCS-2 a valid encoding that I just don't know about? Yes, it is. Your example of the VFAT system is a near perfect case, since the details of it form what Unicode calls a 'Higher level protocol' and

Re: Unicode in VFAT file system

2000-07-21 Thread Mark Davis
Unicode has changed and evolved over the years. At this point, UCS-2 is a funny beast, because it shares precisely the same encoding space as UTF-16. That is, in code units there is absolutely no difference between them. The only real difference is whether you interpret the code units in the

Re: Unicode in VFAT file system

2000-07-21 Thread Peter_Constable
As a serialization, UTF-16 has three forms: UTF-16, UTF-16BE, and UTF-16LE. The first is with (optionally) a BOM, and the others without. I know this is what the Standard dictates, and I think I understand why, but it doesn't make complete sense to the novice trying to find his/her way: novice

Re: Unicode in VFAT file system

2000-07-21 Thread John Cowan
[EMAIL PROTECTED] wrote: Why does it say there are three varieties when a 16-bit datum can only be serialised in two orders? The simplest way to think about it is to remember that a MIME charset is meant to provide *minimal* information for the receiver to convert bytes into characters. If

RE: Unicode in VFAT file system

2000-07-21 Thread Asmus Freytag
At 04:58 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote: If UCS-2LE is a *standard* encoding (and it is in fact mentioned in UTR-17), how does VFAT directories qualify as a "higher level protocol"? My understanding of "higher level protocol" is that it is a *non* standard usage of some kind, allowed

Re: Unicode in VFAT file system

2000-07-21 Thread Asmus Freytag
At 07:14 AM 7/21/00 -0800, [EMAIL PROTECTED] wrote: Why does it say there are three varieties when a 16-bit datum can only be serialised in two orders? If the scheme UTF-16 doesn't have a BOM, isn't it just one of the other two? When it does have a BOM, it can still be serialised in two ways, so

RE: Unicode in VFAT file system

2000-07-21 Thread Becker, Joseph
Jony Rosenne, who has been a great contributor since or before the beginning, wrote in an off moment: UTF-8 is a biased transformation format designed to save American and Western Europeans storage space and to give some people a warm feeling by keeping Unicode in the familiar 8 bit world.

Re: Unicode in VFAT file system

2000-07-21 Thread Peter_Constable
On 07/21/2000 12:55:59 PM [EMAIL PROTECTED] wrote: The problem is that the labels where invented to tag data streams, not to 'label' the result of autodetection. As you point out there are 4 results of auto-detection: UTF-16, no BOM UTF-16, no BOM, but arriving in reverse byte order (for my

RE: Unicode in VFAT file system

2000-07-20 Thread Yves Arrouye
Recently I've had the dubious pleasure of delving into the details of the VFAT file system. For long file names, I thought it used UCS-2, but in looking at the data with a disk editor, it appears to be byte-swapping (little endian). I thought that UCS-2 was by definition big endian, thus

Re: Unicode in VFAT file system

2000-07-20 Thread Asmus Freytag
At 09:53 AM 7/20/00 -0800, Ken Krugler wrote: 2. Is little-endian UCS-2 a valid encoding that I just don't know about? Yes, it is. Your example of the VFAT system is a near perfect case, since the details of it form what Unicode calls a 'Higher level protocol' and those may legitimately override

Re: Unicode in VFAT file system

2000-07-20 Thread Ken Krugler
Hi Addison, UCS-2 is pretty close to the same thing as UTF-16. The differences do not apply here. UCS-2 can be big-endian or little-endian. The rule is that BE is the default. However, on Intel platforms, you shouldn't be surprised to see LE everywhere: that's the architecture. Microsoft is

Re: Unicode in VFAT file system

2000-07-20 Thread Asmus Freytag
At 11:34 AM 7/20/00 -0800, John Cowan wrote: 1. Could it be using UTF-16LE? I tried creating an entry with a surrogate pair, but the name was displayed with two black boxes on a Windows 2000-based computer, so I assumed that surrogates were not supported. Probably not. So technically it

Re: Unicode in VFAT file system

2000-07-20 Thread Asmus Freytag
At 11:41 AM 7/20/00 -0800, Ken Krugler wrote: No. UCS-2 and UCS-4 have always been bigendian. Read ISO 10646-1:1993, section "6.3 Octet order" (page 7): When serialized as octets, a more significant octet shall precede less significant octets. The section continues: "When not serialized

Re: Unicode in VFAT file system

2000-07-20 Thread addison
Well... There has always been a BOM in Unicode and it's there for a reason: to indicate the byte order on different processors. There is an inherent BE bias in Unicode. But this doesn't invalidate an LE view of the Universe. Avoiding for the moment the word-parsing that Markus suggests, Unicode

Re: Unicode in VFAT file system

2000-07-20 Thread Doug Ewell
Addison Phillips [EMAIL PROTECTED] wrote: Avoiding for the moment the word-parsing that Markus suggests, Unicode on Microsoft platforms has always been LE (at least on Intel) and they have called the encoding they use "UCS-2" (when they bothered with such things: in the past they always