On May 23, 2008, at 8:20 AM, Michael Adams wrote:
This bit i am fairly hazy about: UTF-16 allows 256 * 256 or 65500+
characters and UTF-32 allows 256 * 256 * 256 * 256 characters and are
International standards.
Not precisely. UTF-16 allows 256 * 256 - 2048 + 1024 * 1024, or
1,112,064 characters, 63,488 being two bytes, and 1,048,576 being four
bytes. 1024 characters out of the 65,536 possible two-byte codes are
reserved to be used as the first half of a four-byte character, and
another 1024 as the second half.
UTF-32 allows only the same 1,112,064 characters. UTF-32 is obviously
wasteful, and is not meant to be used except in cases where you want
to be able to find the nth character in a string without counting.
(You can do the same thing with UTF-16 if all the characters fit in
the base 63,488, which will usually be the case unless you're using
something rare, such as Egyptian hieroglyphics or abnormal Chinese
dialects.)
UTF-8 also allows only the same 1,112,064 characters, in one, two,
three, or four bytes. UTF-8 normally takes less space than UTF-16 if
most of the characters are in US-ASCII, but tends to take more space
otherwise.
--
John W Kennedy
"You can, if you wish, class all science-fiction together; but it is
about as perceptive as classing the works of Ballantyne, Conrad and W.
W. Jacobs together as the 'sea-story' and then criticizing _that_."
-- C. S. Lewis. "An Experiment in Criticism"
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]