Hi Ken,

UCS-2 is pretty close to the same thing as UTF-16. The differences do not
apply here.

UCS-2 can be big-endian or little-endian. The rule is that BE is the
default. However, on Intel platforms, you shouldn't be surprised to see LE
everywhere: that's the architecture. Microsoft is saving two bytes for
every filename by not storing a BOM.

You should note that Microsoft *means* UCS-2LE (and UTF-16LE in more
modern systems) when they say "Unicode" (at least on Intel platforms).

So:

1. Yes, it is perfectly valid.
2. There are no characters in the surrogate space just yet, so a black
square should be no surprise. Two black squares means that it's being
treated as UCS-2.
3. Filenames are, by definition in Windows-land, UPPERCASE in Western
European systems. Other scripts either don't have the concept of case or
weren't mucked with. This includes compatibility characters stored outside
the U+0000 to U+00FF range.

Regards,

Addison

===========================================================
Addison P. Phillips                    Principal Consultant
Inter-Locale LLC                http://www.inter-locale.com
Los Gatos, CA, USA          mailto:[EMAIL PROTECTED]

+1 408.210.3569 (mobile)              +1 408.904.4762 (fax)
===========================================================
Globalization Engineering & Consulting Services

On Thu, 20 Jul 2000, Ken Krugler wrote:

> Hi Unicoders,
> 
> Recently I've had the dubious pleasure of delving into the details of 
> the VFAT file system. For long file names, I thought it used UCS-2, 
> but in looking at the data with a disk editor, it appears to be 
> byte-swapping (little endian). I thought that UCS-2 was by definition 
> big endian, thus I've got the following questions:
> 
> 1. Could it be using UTF-16LE? I tried creating an entry with a 
> surrogate pair, but the name was displayed with two black boxes on a 
> Windows 2000-based computer, so I assumed that surrogates were not 
> supported.
> 
> 2. Is little-endian UCS-2 a valid encoding that I just don't know about?
> 
> 3. And finally, why are file names case-insensitive for characters in 
> the U-0000 to U-00FF range, but not for any other characters? OK, 
> maybe I can guess at the answer to that one...
> 
> Thanks,
> 
> -- Ken
> Ken Krugler
> TransPac Software, Inc.
> <http://www.transpac.com>
> +1 530-470-9200
> 

Reply via email to