I remember there were some study to show although UTF-8 encode each Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use LESS characters in writting to communicate information than alphabetic base langauges.
For my commercial Japanese-to-English translation work, I estimate from 2.3 to 3.2 Japanese characters for one word of English, estimated at 6 characters. It varies depending on the kanji to kana ratio in the source text.
For commercial contemporary Chinese-to-English translation, I estimate 1.4 to 1.8 Chinese characters per English word, estimated at 6 characters. (I just asked about this on a mailing list devoted to C-E/E-C translation and the one translator who responded said he uses 1.62 Chinese characters per English word which agrees with my experience.)
Since a "word" is probably about the smallest chunk of meaning you're going find, this would suggest that where it takes 6 bytes to encode a word of English at one-byte per character, at 3 bytes per character, it will take from about 4.3 to 3.3 bytes to encode a word of Chinese, I guess.
The above applies to contemporary (modern) traditional Chinese. I don't know if there is a practical difference in efficiency between traditonal and simplified. But from my experience with classical Chinese, I would guess that most classical Chinese is at least twice as efficient as modern Chinese. (This, plus its freedom from any tight dependence on sound, facilitated its great success as the language of culture throughout the traditional kanji culture realm --- China, Korea, Japan, Vietnam, etc., imo.)
FWIW,
Jon
-- Jon Babcock <[EMAIL PROTECTED]>

