On Sun, 26 Oct 2003 23:36:55 -0500
"Mrs. Brisby" <[EMAIL PROTECTED]> wrote:
> It's good to use null-terminated in many cases; especially in collating
> and sorting. It helps to understand that in those cases you stop
> processing _after_ you see the terminator (and treat the terminator as
> it is: zero.)

Collating involves with length. If data length is known prior to scanning
data, in some cases you can skip it if it doesn't match without scanning
data body. It helps to understand that in those cases you stop processing
_before_ you see the terminator or anything else.

> UTF-16 is NOT used in HFS+. HFS+ still uses ASCII with some "tricks".
> UFS is what's "preferred" in MacOS X, and it doesn't use UTF-16 either.
> UTF-16 isn't what we're talking about anyway, it's UCS16.

Thank you for your clarification, I'd like to hear more about that
imaginative "tricks", but it's OT I'm afraid.

MacOS X uses "Unicode" as its native encoding. In Unicode encoding
the most used in MacOS X is UTF-16. Only to call BSD API it uses
UTF-8. It's kind of hybrid, but UTF-8 is just used for compatibility to
Unix parts in MacOS X, and other non-Unix pieces in MacOS X, which
is why MacOS X is Mac, is using UTF-16 internally, including Carbon,
Cocoa and ATSUI.

For HFS+, from Apple's Technical Note TN2078 (Migrating to FSRefs & long
Unicode names from FSSpecs):
http://developer.apple.com/technotes/tn2002/tn2078.html

"How file names are encoded
HFS+ disks store file names as UTF-16 in an Apple-modified form
of Normalization Form D (decomposed). This form excludes certain
compatibility decompositions and parts of the symbol blocks, in order
to assure round-trip of file names to Mac OS encodings (applications
using the HFS APIs assume they get the same bytes out that they
put in).

In Mac OS X 10.2, the decomposition rules used were changed from
Unicode 2.0.x (based on an intermediate draft) plus the
above-mentioned Apple modifications, to Unicode 3.2 plus the
above-mentioned Apple modifications. The Unicode Consortium has
committed to not changing the decomposition rules after Unicode 3.2,
so we shouldn't have to do this again. The change from 2.0.x to 3.2 was
necessary because A) lots of new decompositions had been added,
and B) the 2.0.x data was full of errors.

Other file systems use different storage formats. UFS disks use
UTF-8, HFS disks use Mac OS encodings. AFP (AppleShare) uses
Mac OS encodings prior to 3.0, and UTF-16 for 3.0 or later. "

-- KL


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to