Re: [users] Character Encodings

John W Kennedy Fri, 23 May 2008 11:00:15 -0700

On May 23, 2008, at 8:20 AM, Michael Adams wrote:


This bit i am fairly hazy about: UTF-16 allows 256 * 256 or 65500+
characters and UTF-32 allows 256 * 256 * 256 * 256 characters and are
International standards.

Not precisely. UTF-16 allows 256 * 256 - 2048 + 1024 * 1024, or1,112,064 characters, 63,488 being two bytes, and 1,048,576 being fourbytes. 1024 characters out of the 65,536 possible two-byte codes arereserved to be used as the first half of a four-byte character, andanother 1024 as the second half.

UTF-32 allows only the same 1,112,064 characters. UTF-32 is obviouslywasteful, and is not meant to be used except in cases where you wantto be able to find the nth character in a string without counting.(You can do the same thing with UTF-16 if all the characters fit inthe base 63,488, which will usually be the case unless you're usingsomething rare, such as Egyptian hieroglyphics or abnormal Chinesedialects.)

UTF-8 also allows only the same 1,112,064 characters, in one, two,three, or four bytes. UTF-8 normally takes less space than UTF-16 ifmost of the characters are in US-ASCII, but tends to take more spaceotherwise.


--
John W Kennedy

"You can, if you wish, class all science-fiction together; but it isabout as perceptive as classing the works of Ballantyne, Conrad and W.W. Jacobs together as the 'sea-story' and then criticizing _that_."

  -- C. S. Lewis.  "An Experiment in Criticism"




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [users] Character Encodings

Reply via email to